Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!seismo!rutgers!umnd-cs!umn-cs!papowell
From: papowell@umn-cs.UUCP (Patrick Powell)
Newsgroups: comp.lang.fortran,comp.lang.pascal
Subject: Re: Array storage order
Message-ID: <1610@umn-cs.UUCP>
Date: Sat, 30-May-87 09:28:45 EDT
Article-I.D.: umn-cs.1610
Posted: Sat May 30 09:28:45 1987
Date-Received: Tue, 2-Jun-87 00:37:52 EDT
References: <1215@batcomputer.tn.cornell.edu>
Sender: news@umn-cs.UUCP
Reply-To: papowell@umn-cs.UUCP (Patrick Powell)
Organization: University of Minnesota
Lines: 238
Keywords: Fortran, Pascal and C arrays (long)
Xref: mnetor comp.lang.fortran:110 comp.lang.pascal:165

In article <1215@batcomputer.tn.cornell.edu> garry@oak.cadif.cornell.edu writes:
>We're trying to arrange for a C subroutine library to be callable from
>Fortran and Pascal and also transportable between machines. 
>
>One of the questions that has come up is with respect to the storage
>order of 2-dimensional arrays. In C, array element [1][2] is stored
>in memory immediately after element [1][1]. I believe this is locked
>into the language - pointer arithmetic and all that. On the Fortran
>compilers I can get to, (2,1), not (1,2), immediately follows (1,1).
>The pascal I haven't investigated yet.
>
>Question: is this array storage order mandated by the Fortran-77 standard?
>How about the Pascal standard?   If so, then I'll document the arrays as being
>"backwards" in these languages. If not, I'll keep two code versions. 
>Obviously, two manual versions would be less hassle.
>
>Help appreciated.
>
>garry wiegand   (garry@oak.cadif.cornell.edu - ARPA)
>		(garry@crnlthry - BITNET)

This is a rather involved discussion of arrays in FORTRAN, PASCAL, and
C If you want more details, I suggest that you might want to look at a
book on programming languages, such as "Principles of Programming
Languages" by Bruce MacLennan, "Programming Languages" by Ghezzi and
Jazayeri,  or "Design and Implementation of Programming Languages" by
Pratt.  The MacLennan is the easiest to read,  the Ghezzi is the most
technical.

If you document things,  document them correctly.  Briefly,  FORTRAN
is COLUMN major,  C and Pascal are ROW major.

FORTRAN
-------
Array storage order in FORTRAN is mandated
by the FORTRAN language standard (ANSI X3.9-1966 and 1977)
to be COLUMN MAJOR.  For example, A(3,2) would have representation in memory
A(1,1) A(2,1) A(3,1) A(1,2) A(2,2) A(3,2)

Actually,  what is specified is a method of translation of an array
subscript expressions into a position in the array.  For an array
declaration of the form: A(ll1:ul1, ll2:ul2,...lln:uln) (Max of 3
dimensionsin F66,7 in F77) a subscript expression of the form:
A(i1,i2,i3,...i7) has a subscript value of the form:
SF1*(i1-ll1)+SF2*(i2-ll2)+...+SFn(in-lln), where SFi is a "scale
factor";  if you have an array of ojects,
SF1 = 1
SF2 = (ul1-ll1+1)*1 = (elements in previous dimension)*(size of element)
    = M1*SF1
...
SFn = (ul(n-1)-ll(n-1)+1)*SF(n-1)
The size (number of elements) in the array is SZ=M1*M2*...*Mn = SFn+1

The array is actually a "chunk" of memory, of size SZ.
A position in the array is given by the value of the subscript
value.
If you work out the arithmetic,  you will find that A(1,x,y...) is
immediately followed by A(2,x,y...),  etc.,  which indicates that
the first column is stored first,  the next column next,
and so forth: i.e.- column major or column order.

What happens when you pass an array or an array element to a
subroutine/function? (I will refer to this as procedure).
Fortran passes variables (arrays and variables) by REFERENCE; it passes
expressions by VALUE.  Usually this is implemented by passing the
address of an array or variable,  and by evaluating an expression,
stuffing the value in some "anonymous variable", and passing the
address of the anonymous variable.

If the formal parameter (dummy variable) of a procedure is declared as an
array,
the actual parameter must be an array,  or an array element.
The size/dimensions of the array may be different than that of the
original array;  for example,  the following is (ugly) but acceptable:

A(2,3)           SUBROUTINE X(V)
... CALL X(A)    REAL V(2,2,2)

In the subroutine,  the array is "bound" to start at the actual parameter
position,  and continue until the end of the actual parameter array.
For this reason:
A(1,1) -> V(1,1,1)
A(2,1) -> V(2,1,1)
A(3,1) -> V(1,2,1)
A(1,2) -> V(2,2,1)
A(2,2) -> V(1,1,2)
A(3,2) -> V(2,1,2)

What about V(1,2,2)?  This references beyond the original (actual) array bounds,
and is an "undefined" (NOT ILLEGAL:  UNDEFINED!) reference.

By the way,  this is also legal:

A(2,3)           SUBROUTINE X(V)
... CALL X(A)    REAL W(1)
                    W(1) = W(2)+ W(3) + W(4)+ ...

Clearly CALL X(A(1,2)) will result in the following
A(1,2) -> V(1,1,1)
A(2,2) -> V(2,1,1)
A(3,2) -> V(1,2,1)
 ... (undefined past here)

You can pass a column of an array at a time,  using:
CALL A(1,1)    SUBROUTINE A(X)
CALL A(1,2)    REAL X(2)
               X(1) = X(2) ....

PASCAL
------

In pascal, arrays are declared as:
v: array [type] of base; {type can be subrange like 1..3, base is base type}
v: array [1..3] of real;
x: array[1..3] of array [1..2] of real;

To make life easier on people,  the language has a "rewriting rule" or
"sytactic redefintion rule",  stating that:
x : array [1..3,1..2] of real;  
is transformed/rewritten as:
x: array[1..3] of array [1..2] of real;

Similarly,  the array subscript x[i,j] ===> x[i][j] ===>(x[i])[j]
The last is actually the precedence relation of the subscript operator.

Now,  let us assume that elements in an array (single dimension array)
are stored in sequential order in memory.  AS FAR AS I KNOW,  THIS IS NOT
REPEAT NOT a language requirement.  However,  there are few reasons to
do it otherwise.  (Quiet in the back,  you animals!  Sparse arrays be damned!)

Now we have the interesting table: 
v[1,1] v[1,2] v[2,1] v[2,2] v[3,1] v[3,2]
That is,  the first ROW is stored first,  the next second, etc.

It turns out that the formula that was used for determing the position
of an element in an array,  which was used for FORTRAN,  can also be used
for PASCAL.

A(i1,i2,i3,...i7) will access the element in position:
SF1*(i1-ll1)+SF2*(i2-ll2)+...+SFn(in-lln), where SFi is a "scale
factor";
SFn = 1
SFn-1 = (uln-lln+1)*1 = (elements in previous dimension)*(size of element)
    = Mn-1*SF1
...
SF1 = (ul2-ll2+1)*SF2
SF0 = (ul1-ll1+1)*SF1 = size of the array.

When you pass an value in PASCAL,  you pass by value or reference;
in addition,  the TYPES of the formal and actual parameter must agree.
Thus, the usual game playing that is done in FORTRAN with resizing and
slicing cannot be done.  However,  you have another problem.
Since Pascal is a moderately strongly typed language,  and requires
the types of actual and formals to agree,  then you run into the problem
when you want a parametric or polymorphic typed function.  Say you want
to search an array for an element:
o
type s5= array [1..5] of char;
s6 = array [1..6] of char;
var p5:s5; p6:s6;
function S5(var x: array [1..5] of char ); system; {don't ask}
function S6(var x: array [1..5] of char ); system; {don't ask}

For each different size of string, you need to have A NEW FUNCTION.
To fix this problem,  Pascal has been extended to have conformant arrays.

function S(var x: array [ll..ul:int] of char ); system;

This definition will cause ll and ul to have the values of the lower and upper
limits of the array bounds in the body of the function.  How, you might ask
does a person implement this?  By using DOPE VECTORS (descriptors in some
languages), which contain all of the information about an array.
The format of the table is usually (but not always) in the form:

------------------------------------------------------
Address | (points to the array area)
------------------------------------------------------
ll1 | (lower limit of 1st dimension)
ul1 | (lower limit of 1st dimension)
SF1 | (Scale factor for 1st dimension)
------------------------------------------------------
ll2 | (lower limit of 2st dimension)
ul2 | (lower limit of 2st dimension)
SF2 | (Scale factor for 2st dimension)
------------------------------------------------------
.....
ll2 | (lower limit of 2st dimension)
ul2 | (lower limit of 2st dimension)
SF2 | (Scale factor for 2st dimension)
------------------------------------------------------

Note that all of the information below the first entry is static,
and determined at compilation time.

When a conformant array is passed,  usually a pointer to a dope vector
is passed.  Some times the dope vector has the form:
------------------------------------------------------
Address | (points to the array area)
------------------------------------------------------
Descriptor | (points to a descriptor)
------------------------------------------------------

Sometimes there are two things passed:  the address and the descriptor.
Sometimes this is done for ALL arrays;  sometimes only for conformant
arrays.  If it is done at all.

Have fun with this one!

C
_

In C, arrays are stored in ROW major form.
For example:
real w[3][2]; /* w[1][1] w[1][2] w[2][1] w[2][2] w[3][1] w[3][2] */
This relates to the C language definition syntax, which is described
in detail in Kernighan and Ritchie,  "The C Programming Language"

The C language passed parameters by value.  However, when you pass an
array,  say: printall(w), the C language requires a pointer to be
passed.  That is, the name of an array is COERCED or cast into a
pointer value that is the start of the array.  However,  if you pass
printit(w(1)),  the value of w(1) is passed.  If you want to pass the
address of the element w(1),  you have to use: printit(&w(1)) or
printit(w+1).

SUMMARY
-------
Fortran is COLUMN MAJOR
C and Pascal are ROW MAJOR
Fortran by reference (address)
Pascal is by value/reference,  exact format is anybodys guess.
C by reference (address)


Patrick Powell
-- 
Patrick Powell, Dept. Computer Science, 136 Lind Hall, 207 Church St. SE,
University of Minnesota,  Minneapolis, MN 55455 (612)625-3543/625-4002