Path: utzoo!attcan!uunet!lll-winken!sol.ctr.columbia.edu!cica!iuvax!silver!jashley From: jashley@silver.ucs.indiana.edu (J. Michael Ashley) Newsgroups: comp.lang.functional Subject: re: Id array comprehensions revisited Message-ID: <51173@iuvax.cs.indiana.edu> Date: 17 Jul 90 16:28:29 GMT Sender: news@iuvax.cs.indiana.edu Reply-To: jashley@silver.ucs.indiana.edu (J. Michael Ashley) Organization: Graceland Lines: 62 Wolfgand Grieskamp writes: >In article <51123@iuvax.cs.indiana.edu> jashley@silver.ucs.indiana.edu (J. Michael Ashley) writes: >>A little while ago I asked why array comprehensions are not parallel. In >>particular, why does an array comprehension like this: >> >> { array (1, n) | [i] = f i || i <- 1 to n } >> >not evaluate the (f i)'s concurrently? Well, it turns out that the i's get >generated sequentially as a list (a list in the data structure sense, mind >you). > >This index list is intermediate and an optimizing compiler can eliminate it >using loop fusion. [ omitting the pleasantly detailed derivation of optimization ] >Using the above rule you get finaly: > > DEF init1(A,f,i,n) == IF i>n THEN A > IF i<=n THEN init1(update(A,i,f(i)),i+1,n) FI > >(A compiler will probably not use this fold/unfold method, >because it depends to much on "eureka" steps. Furthermore, the view >is based on the assumption that update is O(1). Such an update >can be established by a sharing analysis for the given example). > >Regarding to concurrent initialization assume simplisticaly that init1 >is not strict in its first argument. Actually, this is precisely the optimization that the current Id compiler does right now; if it sees the construct "1 to n", then the compiler will transform it into a loop. Since we are using I-structures, the update is O(1) (without sharing analysis), and init1 is not strict in it's first argument. This doesn't solve the problem, however. Since we are using a dataflow architecture, the function f will undergo partial evaluation to the point where it must have it's argument. What this means is that if we have lots and lots of processors, 90% of the work of creating the array is done by the first iteration or two of init1. The execution is now bottlnecked by this sequential generation of the indices. A slightly better compiler will generate the indices in O(log n) time without any trouble. But this doesn't solve the problem completely. In the case of accumulator comprehensions, the programmer will probably not be using this easy-to-optimize "1 to n" construct. Instead, there will be some arbitrary function (probably complicated) that *must return a list*. Even the most trivial numerical codes will be more than a match for any compiler optimization. So we are again stuck with lots of parallelism when generating the elements of the list but bounded by an O(n) append to put this results back together to form the final list. Thomas Breuel suggested a couple of ways ("difference lists and/or pointer jumping") to overcome the O(n) append problem. This is kind of a hack, and Thomas admits that. I still sit on the position that lists are not the right data structure for the indices. mike ashley jashley@lanl.gov