Path: utzoo!mnetor!uunet!steinmetz!sungoddess!oconnor
From: oconnor@sungoddess.steinmetz (Dennis M. O'Connor)
Newsgroups: comp.arch
Subject: Re: RISC is a nasty no-no!
Message-ID: <9788@steinmetz.steinmetz.UUCP>
Date: 4 Mar 88 16:24:06 GMT
References: <4400@aw.sei.cmu.edu>
Sender: news@steinmetz.steinmetz.UUCP
Reply-To: sungoddess!oconnor@steinmetz.UUCP
Organization: GE Corporate R&D Center
Lines: 73

An article by daveb@geac.UUCP (David Collier-Brown) says:
] In article <4400@aw.sei.cmu.edu> firth@bd.sei.cmu.edu.UUCP (Robert Firth) writes:
] [discussion about array representation, and the hardware support
] appropriate to multiplicative addressing]
] 
] >However, my reading of ANSI X3.9-1978, especially Section 5, on arrays,
] >leads me to conclude that array representation in column-major form,
] >and array access by chain multiplication-&-addition of subscripts, is
] >the only feasible implementation choice.
] 
]   My reading of the ANSI proposal leads me to the same conclusion.
] This is both good (it eliminates an ambiguity) and bad (it removes
] the freedom of an compiler-writer to do what is best applicable to
] her architecture).
] -- 
]  David Collier-Brown.                 {mnetor yunexus utgpu}!geac!daveb

Why is it we have to reinvent the wheel so often ? N-dimensional
arrays, whether column or row major order, can always be accessed
without run-time multiplies. And can still be in a continuous block of
memory. The method for doing this is at least 14 years old : I heard
about it in 1974.

An example will illustrate. BTW, this is NOT the most optimized
algorithm, but they are simple permutations of this.

Given an array that is N1 by N2 by N3 et cetera. First, allocate
your block of memory for the N1*N2*N3*etc elements of the array. Then,
for each dimension EXCEPT the minor one (in this example, Nn, but it
could be N1 or any Nx ) allocate space for a vector of words with a number
of elements equal to the cardinality of that dimension. Then simply
fill in these vectors with (range 1..Cardinality(Nx)) * (whatever
multiplier is used for this dimension in the multiplication-&-addition
mwthod ). Note that this is either done at compile time, or ONCE when
the array is allocated, and uses very little multiplication.

Now, when you need to access ARRAY[a,b,c,...,n] you get the address
by using ARRAY_BASE + N1VEC[a] + N2VEC[b] + N3VEC[c] ... + n.
No multiplies.

This, I understand, is the ORIGINAL meaning of the term "VECTORIZED
ARRAY". Here's a concrete example :

  type FAST_ARRAY is array[1..6,1..4,1..7] of WHATEVER ;

produces :

 type MEMBLOCK is array [ 6*4*7 ] of WHATEVER ;
 N1VEC : constant array [1..6] of INTEGER := ( 0, 28, 56, 84, 112, 140 ) ;
 N2VEC : constant array [1..4] of INTEGER := ( 0, 7, 14, 21 ) ;

 PRAGMA INLINE( GET_FAST_ARRAY, SET_FAST_ARRAY ) ;

 function GET_FAST_ARRAY( M : MEMBLOCK; A,B,C : INTEGER ) return WHATEVER is
 begin
    return M( N1VEC( A ) + N2VEC( B ) + C ) ;
 end GET_FAST_ARRAY ;

 procedure SET_FAST_ARRAY( M : in out MEMBLOCK; a,b,c : INTEGER;
			   INPUT : WHATEVER ) is
 begin
   M( N1VEC( A ) + N2VEC( B ) + C ) := INPUT ;
 end SET_FAST_ARRAY ;

This just an algorithmic description, I haven't compiled it.
But you get the idea. And this will work for column-major,
row-major, 3rd-dimension-major, whatever.

It's essentially just having a look-up table for frequently used multiplies.
--
    Dennis O'Connor			      oconnor%sungod@steinmetz.UUCP
		   ARPA: OCONNORDM@ge-crd.arpa
   (-: The Few, The Proud, The Architects of the RPM40 40MIPS CMOS Micro :-)