Path: utzoo!mnetor!uunet!lll-winken!lll-tis!ames!pasteur!agate!ig!uwmcsd1!bbn!rochester!PT.CS.CMU.EDU!SPICE.CS.CMU.EDU!skef
From: skef@SPICE.CS.CMU.EDU (Skef Wholey)
Newsgroups: comp.lang.misc
Subject: The Joy of Zero-based Arrays
Message-ID: <1012@PT.CS.CMU.EDU>
Date: 2 Mar 88 21:31:36 GMT
Sender: netnews@PT.CS.CMU.EDU
Organization: Carnegie-Mellon University, CS/RI
Lines: 141


In a couple of recent articles, people have asserted that zero-based arrays
are somehow more burdensome and less natural to use than one-based arrays:

>From: sommar@enea.se (Erland Sommarskog)
>[...]
>  The idea of having all arrays starting on index zero is really to
>put the burden on the programmer. If Wirth really must have a fixed
>lower index, he could at least have chosen one! How many of you do
>often use arrays with zero as the lower bound? I do it very seldom.
>And his argument about saving execution time for index computation
>is truely naive. The result will just be references like:
>    String(char_no + 1), 
>just making the code more verbose and complex for no use.
>[...]

>From: pattis@june.cs.washington.edu (Richard Pattis)
>
>I too detest 0-based arrays (and think 1-based arrays are better).  In trying
>to rationalize this choice, I came up the the following argument: When we
>process arrays, we often need to know the size of the array and/or the
>position of the last element.  In 0-based arrays, these quantites are off by
>one; in 1-based arrays they are the same.  Hence, I think that people will
>make fewer errors with 1-based arrays.

Here I argue the opposite: that zero-based arrays are God's Gift To
Programmers, and that one-based arrays are Satan's Way Of Complicating Code.
I find that add-or-subtract-one code is more likely to arise in situations in
which one-based arrays are used.  I find that zero-based arrays lead to
simpler code.

Consider the task of processing subsequences of a one-dimensional array.
There are two obvious ways to describe a subsequence: with a start and an end,
or with a start and a length.  Consider now two problems: iterating over a
subsequence (while correctly dealing with zero length subseqences), and moving
from one subsequence to the next.

The notion of subsequence in Common Lisp (which, like C, provides only
zero-based arrays) is that the start index specifies the first element of a
subsequence and the end index specifies the index *one-beyond* the last
element of the subsequence.  Given the length and start of a subsequence, one
computes the end simply by adding (i.e., End = Start + Length).  Likewise, the
length of a subsequence can be computing by subtracting the start from the
end (i.e., Length = End - Start).  A zero-length sequence has Start = End.

   0 1 2 3 4 5 6 7 8
  +-+-+-+-+-+-+-+-+-+
  |X|X|X| | | | | | |
  +-+-+-+-+-+-+-+-+-+
   ^     ^
   |     |
 start  end

The subsequence of three X's has a start of 0 and an end of 3.  Its length is
3 - 0 = 3.  The length of this array is 9 -- it has nine elements; as a
subseqeunce it can be described as having start = 0 and end = 9 (or length =
9).  The Common Lisp Make-Array function uses this number, and the
Array-Length function will return this number.  Likewise the declared size in
C would be this number, and sizeof will return something proportional to it.
Within this beautifully consistent end-exclusive model, *none* of these
quantities are "off by one."  To frob the elements of a subsequence of this
sort, we do something like:

  (do ((i start (1+ i)))   ; Do for I beginning at Start, adding one each time
      ((= i end))          ; Until I = End
    (frob (aref array i))) ; Frobbing the I'th element of the Array

or

  (do ((i1 start (1+ i))
       (i2 0 (1+ i)))      ; I2 tells us where we are relative to the start
      ((= i2 length))
    (frob (aref array i1) (aref another-array i2)))

The first loop says, in non-Lispy terms:

  For I from Start (inclusive) to End (exclusive) Do
    Frob(Array[I]);

The end test must always come before the body of the loop so that zero-length
subsequences are processed correctly.

The subsequence following the one defined by the interval [start,end) is
[end,new-end).

  (do ((i old-end (1+ i)))
      ((= i new-end))
    ...)

The *only* +1's here are in the iteration steps; there are *none* in the
initialization or termination clauses.  The start-inclusive end-exclusive
property is captured in the simple iteration construct Dotimes.  For example,

   (dotimes (i 4)
     ...)

executes its body four times, for I=0, I=1, I=2, and I=3.

Now, one can of course use the end-exclusive model with one-based arrays, but
then the subsequence containing the whole array does *not* have an end that is
the highest index of the array; it is one more than that index.  So when using
one-based arrays people usually use an end-INclusive model, which also leads
to nastiness. The Length of a subsequence is now End - Start + 1.  To
represent a zero-length sequence the end index must be one LESS than the start
index.

  (do ((i start (1+ i)))
      ((= i (1+ end)))
    (frob (oref array i))) ; "Oref" for "One-based Aref"

or

  (do ((i1 start (1+ i))
       (i2 1 (1+ i)))      ; I2 tells us where we are relative to the start
      ((= i2 (1+ length)))
    (frob (oref array i1) (oref another-array i2)))

What was a clean, crisp, equality test with the end index in the first loop
above is now a sloppy-looking test against the end plus one.  What was a
comparison with the length is now a comparison with the length plus one.

And a following subsequence starts at one MORE then the end of the current
subsequence.

  (do ((i (1+ old-end) (1+ i))
      ((< i new-end))
    ...)

So the +1's suddenly start creeping into the code, into the initialization or
termination clauses.

I've written tens of thousands of lines of code in Common Lisp and C, and
after I understood the beauty and just plain right-ness of the end-exclusive
model, zero-based arrays became a joy to use.  I seldom, if ever, need to
adjust starts, ends, lengths, or indices by one.  But I had to do that kind of
stuff frequently when using one-based arrays.  The more random incrementing
and decrementing you do, the more fence-post errors your code will have.

Remember: Zero is your friend.

--Skef
  (Preferred mail address: Wholey@C.CS.CMU.EDU)