Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!rutgers!umnd-cs!umn-cs!papowell From: papowell@umn-cs.UUCP (Patrick Powell) Newsgroups: comp.lang.fortran,comp.lang.pascal Subject: Re: Array storage order Message-ID: <1610@umn-cs.UUCP> Date: Sat, 30-May-87 09:28:45 EDT Article-I.D.: umn-cs.1610 Posted: Sat May 30 09:28:45 1987 Date-Received: Tue, 2-Jun-87 00:37:52 EDT References: <1215@batcomputer.tn.cornell.edu> Sender: news@umn-cs.UUCP Reply-To: papowell@umn-cs.UUCP (Patrick Powell) Organization: University of Minnesota Lines: 238 Keywords: Fortran, Pascal and C arrays (long) Xref: mnetor comp.lang.fortran:110 comp.lang.pascal:165 In article <1215@batcomputer.tn.cornell.edu> garry@oak.cadif.cornell.edu writes: >We're trying to arrange for a C subroutine library to be callable from >Fortran and Pascal and also transportable between machines. > >One of the questions that has come up is with respect to the storage >order of 2-dimensional arrays. In C, array element [1][2] is stored >in memory immediately after element [1][1]. I believe this is locked >into the language - pointer arithmetic and all that. On the Fortran >compilers I can get to, (2,1), not (1,2), immediately follows (1,1). >The pascal I haven't investigated yet. > >Question: is this array storage order mandated by the Fortran-77 standard? >How about the Pascal standard? If so, then I'll document the arrays as being >"backwards" in these languages. If not, I'll keep two code versions. >Obviously, two manual versions would be less hassle. > >Help appreciated. > >garry wiegand (garry@oak.cadif.cornell.edu - ARPA) > (garry@crnlthry - BITNET) This is a rather involved discussion of arrays in FORTRAN, PASCAL, and C If you want more details, I suggest that you might want to look at a book on programming languages, such as "Principles of Programming Languages" by Bruce MacLennan, "Programming Languages" by Ghezzi and Jazayeri, or "Design and Implementation of Programming Languages" by Pratt. The MacLennan is the easiest to read, the Ghezzi is the most technical. If you document things, document them correctly. Briefly, FORTRAN is COLUMN major, C and Pascal are ROW major. FORTRAN ------- Array storage order in FORTRAN is mandated by the FORTRAN language standard (ANSI X3.9-1966 and 1977) to be COLUMN MAJOR. For example, A(3,2) would have representation in memory A(1,1) A(2,1) A(3,1) A(1,2) A(2,2) A(3,2) Actually, what is specified is a method of translation of an array subscript expressions into a position in the array. For an array declaration of the form: A(ll1:ul1, ll2:ul2,...lln:uln) (Max of 3 dimensionsin F66,7 in F77) a subscript expression of the form: A(i1,i2,i3,...i7) has a subscript value of the form: SF1*(i1-ll1)+SF2*(i2-ll2)+...+SFn(in-lln), where SFi is a "scale factor"; if you have an array of ojects, SF1 = 1 SF2 = (ul1-ll1+1)*1 = (elements in previous dimension)*(size of element) = M1*SF1 ... SFn = (ul(n-1)-ll(n-1)+1)*SF(n-1) The size (number of elements) in the array is SZ=M1*M2*...*Mn = SFn+1 The array is actually a "chunk" of memory, of size SZ. A position in the array is given by the value of the subscript value. If you work out the arithmetic, you will find that A(1,x,y...) is immediately followed by A(2,x,y...), etc., which indicates that the first column is stored first, the next column next, and so forth: i.e.- column major or column order. What happens when you pass an array or an array element to a subroutine/function? (I will refer to this as procedure). Fortran passes variables (arrays and variables) by REFERENCE; it passes expressions by VALUE. Usually this is implemented by passing the address of an array or variable, and by evaluating an expression, stuffing the value in some "anonymous variable", and passing the address of the anonymous variable. If the formal parameter (dummy variable) of a procedure is declared as an array, the actual parameter must be an array, or an array element. The size/dimensions of the array may be different than that of the original array; for example, the following is (ugly) but acceptable: A(2,3) SUBROUTINE X(V) ... CALL X(A) REAL V(2,2,2) In the subroutine, the array is "bound" to start at the actual parameter position, and continue until the end of the actual parameter array. For this reason: A(1,1) -> V(1,1,1) A(2,1) -> V(2,1,1) A(3,1) -> V(1,2,1) A(1,2) -> V(2,2,1) A(2,2) -> V(1,1,2) A(3,2) -> V(2,1,2) What about V(1,2,2)? This references beyond the original (actual) array bounds, and is an "undefined" (NOT ILLEGAL: UNDEFINED!) reference. By the way, this is also legal: A(2,3) SUBROUTINE X(V) ... CALL X(A) REAL W(1) W(1) = W(2)+ W(3) + W(4)+ ... Clearly CALL X(A(1,2)) will result in the following A(1,2) -> V(1,1,1) A(2,2) -> V(2,1,1) A(3,2) -> V(1,2,1) ... (undefined past here) You can pass a column of an array at a time, using: CALL A(1,1) SUBROUTINE A(X) CALL A(1,2) REAL X(2) X(1) = X(2) .... PASCAL ------ In pascal, arrays are declared as: v: array [type] of base; {type can be subrange like 1..3, base is base type} v: array [1..3] of real; x: array[1..3] of array [1..2] of real; To make life easier on people, the language has a "rewriting rule" or "sytactic redefintion rule", stating that: x : array [1..3,1..2] of real; is transformed/rewritten as: x: array[1..3] of array [1..2] of real; Similarly, the array subscript x[i,j] ===> x[i][j] ===>(x[i])[j] The last is actually the precedence relation of the subscript operator. Now, let us assume that elements in an array (single dimension array) are stored in sequential order in memory. AS FAR AS I KNOW, THIS IS NOT REPEAT NOT a language requirement. However, there are few reasons to do it otherwise. (Quiet in the back, you animals! Sparse arrays be damned!) Now we have the interesting table: v[1,1] v[1,2] v[2,1] v[2,2] v[3,1] v[3,2] That is, the first ROW is stored first, the next second, etc. It turns out that the formula that was used for determing the position of an element in an array, which was used for FORTRAN, can also be used for PASCAL. A(i1,i2,i3,...i7) will access the element in position: SF1*(i1-ll1)+SF2*(i2-ll2)+...+SFn(in-lln), where SFi is a "scale factor"; SFn = 1 SFn-1 = (uln-lln+1)*1 = (elements in previous dimension)*(size of element) = Mn-1*SF1 ... SF1 = (ul2-ll2+1)*SF2 SF0 = (ul1-ll1+1)*SF1 = size of the array. When you pass an value in PASCAL, you pass by value or reference; in addition, the TYPES of the formal and actual parameter must agree. Thus, the usual game playing that is done in FORTRAN with resizing and slicing cannot be done. However, you have another problem. Since Pascal is a moderately strongly typed language, and requires the types of actual and formals to agree, then you run into the problem when you want a parametric or polymorphic typed function. Say you want to search an array for an element: o type s5= array [1..5] of char; s6 = array [1..6] of char; var p5:s5; p6:s6; function S5(var x: array [1..5] of char ); system; {don't ask} function S6(var x: array [1..5] of char ); system; {don't ask} For each different size of string, you need to have A NEW FUNCTION. To fix this problem, Pascal has been extended to have conformant arrays. function S(var x: array [ll..ul:int] of char ); system; This definition will cause ll and ul to have the values of the lower and upper limits of the array bounds in the body of the function. How, you might ask does a person implement this? By using DOPE VECTORS (descriptors in some languages), which contain all of the information about an array. The format of the table is usually (but not always) in the form: ------------------------------------------------------ Address | (points to the array area) ------------------------------------------------------ ll1 | (lower limit of 1st dimension) ul1 | (lower limit of 1st dimension) SF1 | (Scale factor for 1st dimension) ------------------------------------------------------ ll2 | (lower limit of 2st dimension) ul2 | (lower limit of 2st dimension) SF2 | (Scale factor for 2st dimension) ------------------------------------------------------ ..... ll2 | (lower limit of 2st dimension) ul2 | (lower limit of 2st dimension) SF2 | (Scale factor for 2st dimension) ------------------------------------------------------ Note that all of the information below the first entry is static, and determined at compilation time. When a conformant array is passed, usually a pointer to a dope vector is passed. Some times the dope vector has the form: ------------------------------------------------------ Address | (points to the array area) ------------------------------------------------------ Descriptor | (points to a descriptor) ------------------------------------------------------ Sometimes there are two things passed: the address and the descriptor. Sometimes this is done for ALL arrays; sometimes only for conformant arrays. If it is done at all. Have fun with this one! C _ In C, arrays are stored in ROW major form. For example: real w[3][2]; /* w[1][1] w[1][2] w[2][1] w[2][2] w[3][1] w[3][2] */ This relates to the C language definition syntax, which is described in detail in Kernighan and Ritchie, "The C Programming Language" The C language passed parameters by value. However, when you pass an array, say: printall(w), the C language requires a pointer to be passed. That is, the name of an array is COERCED or cast into a pointer value that is the start of the array. However, if you pass printit(w(1)), the value of w(1) is passed. If you want to pass the address of the element w(1), you have to use: printit(&w(1)) or printit(w+1). SUMMARY ------- Fortran is COLUMN MAJOR C and Pascal are ROW MAJOR Fortran by reference (address) Pascal is by value/reference, exact format is anybodys guess. C by reference (address) Patrick Powell -- Patrick Powell, Dept. Computer Science, 136 Lind Hall, 207 Church St. SE, University of Minnesota, Minneapolis, MN 55455 (612)625-3543/625-4002