Path: utzoo!mnetor!uunet!lll-winken!lll-tis!ames!necntc!ima!think!barmar
From: barmar@think.COM (Barry Margolin)
Newsgroups: comp.lang.misc
Subject: Re: The Joy of Zero-based Arrays
Message-ID: <17419@think.UUCP>
Date: 3 Mar 88 04:18:41 GMT
References: <1012@PT.CS.CMU.EDU>
Sender: usenet@think.UUCP
Reply-To: barmar@fafnir.think.com.UUCP (Barry Margolin)
Organization: Thinking Machines Corporation, Cambridge, MA
Lines: 106

In article <1012@PT.CS.CMU.EDU> skef@SPICE.CS.CMU.EDU (Skef Wholey) writes:
>In a couple of recent articles, people have asserted that zero-based arrays
>are somehow more burdensome and less natural to use than one-based arrays:
[references deleted]
>Here I argue the opposite: that zero-based arrays are God's Gift To
>Programmers, and that one-based arrays are Satan's Way Of Complicating Code.

I've used both PL/I (which allows arbitrary array indices, but
defaults to 1-based, and only has 1-based strings, and Lisp, which
only has 0-based arrays.  It hardly needs to be said that the best
thing is to allow the programmer to specify the index base.  I've had
my share of fencepost errors in both languages, but I think I prefer
1-based arrays.

>Now, one can of course use the end-exclusive model with one-based arrays, but
>then the subsequence containing the whole array does *not* have an end that is
>the highest index of the array; it is one more than that index.  

That's true for 0-based arrays, too.  For a 10-element 0-based array,
the end-exclusive subsequence is [0,10), but the highest index of the
array is 9.  If it is 1-based, the end-exclusive subsequence is
[1,11), and the highest index is 10.  What's the difference?

>								  So when using
>one-based arrays people usually use an end-INclusive model, which also leads
>to nastiness. The Length of a subsequence is now End - Start + 1.  To
>represent a zero-length sequence the end index must be one LESS than the start
>index.

Yes, I admit that arrays declared (1 .. 0) seem strange, and I think
some languages don't even allow it.

>
>  (do ((i start (1+ i)))
>      ((= i (1+ end)))
>    (frob (oref array i))) ; "Oref" for "One-based Aref"

actually, we generally used (> i end) as our end test.  Or we use
traditional FROM-TO loops, as in the PL/I

	do i = start to end;
	     frob (array(i));
	end;

In your posting you made mention of something like this when you
trandlated your Lisp DO to pseudo-Pascal:

>  For I from Start (inclusive) to End (exclusive) Do
>    Frob(Array[I]);

Most languages don't have an FOR-loop that is exclusive of the End
(Zetalisp LOOP is an exception, with its FOR <index> FROM <start>
BELOW <end> option); the End is usually inclusive, which works
perfectly when you use the end-inclusive model of array subranges.
For example, to loop over an entire array in PL/I you simply do:

	do i = lbound(array,1) to hbound(array,1);
	     frob (array(i));
	end;

This code is actually independent of the base of the array, because it
uses the LBOUND (Lower BOUND) and HBOUND (Higher BOUND) builtin
functions to determine the actual index range (the second argument is
the dimension whose bound should be returned).

>And a following subsequence starts at one MORE then the end of the current
>subsequence.
>
>  (do ((i (1+ old-end) (1+ i))
>      ((< i new-end))
>    ...)
>
>So the +1's suddenly start creeping into the code, into the initialization or
>termination clauses.

Well, in PL/I, at the end of a "do i = start to end" loop, the
variable i is left with the value end+1.  So, it is common to continue
with:

	do i = i to new-end;...

>I've written tens of thousands of lines of code in Common Lisp and C, and
>after I understood the beauty and just plain right-ness of the end-exclusive
>model, zero-based arrays became a joy to use.  I seldom, if ever, need to
>adjust starts, ends, lengths, or indices by one.  But I had to do that kind of
>stuff frequently when using one-based arrays.  The more random incrementing
>and decrementing you do, the more fence-post errors your code will have.

About the only time in PL/I where you have to add 1 is when going from
loops like the above to calls to the SUBSTR builtin, which extracts a
portion of a string.  This function takes start and length rather than
start and end.

As I pointed out above, there's no real reason why end-inclusive
subranges and 1-based arrays must be related.

By the way, Lisp isn't even consistent about whether it is 1-based or
0-based.  (FIRST <list>) is equivalent to (NTH 0 <list>), (SECOND
<list>) is (NTH 1 <list>), etc.  Mind you, I'm not complaining; the
ordinal functions are intended for those cases where the programmer
knows the position a priori, and we humans tend to think 1-based.

Barry Margolin
Thinking Machines Corp.

barmar@think.com
uunet!think!barmar