Path: utzoo!mnetor!uunet!lll-winken!lll-tis!ames!pasteur!agate!ig!uwmcsd1!bbn!rochester!PT.CS.CMU.EDU!SPICE.CS.CMU.EDU!skef From: skef@SPICE.CS.CMU.EDU (Skef Wholey) Newsgroups: comp.lang.misc Subject: The Joy of Zero-based Arrays Message-ID: <1012@PT.CS.CMU.EDU> Date: 2 Mar 88 21:31:36 GMT Sender: netnews@PT.CS.CMU.EDU Organization: Carnegie-Mellon University, CS/RI Lines: 141 In a couple of recent articles, people have asserted that zero-based arrays are somehow more burdensome and less natural to use than one-based arrays: >From: sommar@enea.se (Erland Sommarskog) >[...] > The idea of having all arrays starting on index zero is really to >put the burden on the programmer. If Wirth really must have a fixed >lower index, he could at least have chosen one! How many of you do >often use arrays with zero as the lower bound? I do it very seldom. >And his argument about saving execution time for index computation >is truely naive. The result will just be references like: > String(char_no + 1), >just making the code more verbose and complex for no use. >[...] >From: pattis@june.cs.washington.edu (Richard Pattis) > >I too detest 0-based arrays (and think 1-based arrays are better). In trying >to rationalize this choice, I came up the the following argument: When we >process arrays, we often need to know the size of the array and/or the >position of the last element. In 0-based arrays, these quantites are off by >one; in 1-based arrays they are the same. Hence, I think that people will >make fewer errors with 1-based arrays. Here I argue the opposite: that zero-based arrays are God's Gift To Programmers, and that one-based arrays are Satan's Way Of Complicating Code. I find that add-or-subtract-one code is more likely to arise in situations in which one-based arrays are used. I find that zero-based arrays lead to simpler code. Consider the task of processing subsequences of a one-dimensional array. There are two obvious ways to describe a subsequence: with a start and an end, or with a start and a length. Consider now two problems: iterating over a subsequence (while correctly dealing with zero length subseqences), and moving from one subsequence to the next. The notion of subsequence in Common Lisp (which, like C, provides only zero-based arrays) is that the start index specifies the first element of a subsequence and the end index specifies the index *one-beyond* the last element of the subsequence. Given the length and start of a subsequence, one computes the end simply by adding (i.e., End = Start + Length). Likewise, the length of a subsequence can be computing by subtracting the start from the end (i.e., Length = End - Start). A zero-length sequence has Start = End. 0 1 2 3 4 5 6 7 8 +-+-+-+-+-+-+-+-+-+ |X|X|X| | | | | | | +-+-+-+-+-+-+-+-+-+ ^ ^ | | start end The subsequence of three X's has a start of 0 and an end of 3. Its length is 3 - 0 = 3. The length of this array is 9 -- it has nine elements; as a subseqeunce it can be described as having start = 0 and end = 9 (or length = 9). The Common Lisp Make-Array function uses this number, and the Array-Length function will return this number. Likewise the declared size in C would be this number, and sizeof will return something proportional to it. Within this beautifully consistent end-exclusive model, *none* of these quantities are "off by one." To frob the elements of a subsequence of this sort, we do something like: (do ((i start (1+ i))) ; Do for I beginning at Start, adding one each time ((= i end)) ; Until I = End (frob (aref array i))) ; Frobbing the I'th element of the Array or (do ((i1 start (1+ i)) (i2 0 (1+ i))) ; I2 tells us where we are relative to the start ((= i2 length)) (frob (aref array i1) (aref another-array i2))) The first loop says, in non-Lispy terms: For I from Start (inclusive) to End (exclusive) Do Frob(Array[I]); The end test must always come before the body of the loop so that zero-length subsequences are processed correctly. The subsequence following the one defined by the interval [start,end) is [end,new-end). (do ((i old-end (1+ i))) ((= i new-end)) ...) The *only* +1's here are in the iteration steps; there are *none* in the initialization or termination clauses. The start-inclusive end-exclusive property is captured in the simple iteration construct Dotimes. For example, (dotimes (i 4) ...) executes its body four times, for I=0, I=1, I=2, and I=3. Now, one can of course use the end-exclusive model with one-based arrays, but then the subsequence containing the whole array does *not* have an end that is the highest index of the array; it is one more than that index. So when using one-based arrays people usually use an end-INclusive model, which also leads to nastiness. The Length of a subsequence is now End - Start + 1. To represent a zero-length sequence the end index must be one LESS than the start index. (do ((i start (1+ i))) ((= i (1+ end))) (frob (oref array i))) ; "Oref" for "One-based Aref" or (do ((i1 start (1+ i)) (i2 1 (1+ i))) ; I2 tells us where we are relative to the start ((= i2 (1+ length))) (frob (oref array i1) (oref another-array i2))) What was a clean, crisp, equality test with the end index in the first loop above is now a sloppy-looking test against the end plus one. What was a comparison with the length is now a comparison with the length plus one. And a following subsequence starts at one MORE then the end of the current subsequence. (do ((i (1+ old-end) (1+ i)) ((< i new-end)) ...) So the +1's suddenly start creeping into the code, into the initialization or termination clauses. I've written tens of thousands of lines of code in Common Lisp and C, and after I understood the beauty and just plain right-ness of the end-exclusive model, zero-based arrays became a joy to use. I seldom, if ever, need to adjust starts, ends, lengths, or indices by one. But I had to do that kind of stuff frequently when using one-based arrays. The more random incrementing and decrementing you do, the more fence-post errors your code will have. Remember: Zero is your friend. --Skef (Preferred mail address: Wholey@C.CS.CMU.EDU)