Path: utzoo!attcan!uunet!seismo!ukma!tut.cis.ohio-state.edu!brutus.cs.uiuc.edu!apple!usc!merlin.usc.edu!usc.edu!raulmill From: raulmill@usc.edu (Raul Deluth Rockwell) Newsgroups: comp.lang.apl Subject: Re: APL Machines Summary: Raul's follow up to Daniel's good article. Message-ID: Date: 21 Sep 89 21:12:52 GMT References: <153557@<1989Sep5> <49700014@uicsrd.csrd.uiuc.edu> <22186@cup.portal.com> <1989Sep19.111751.7613@ziebmef.mef.org> Sender: news@merlin.usc.edu Organization: University of Southern California, Los Angeles, CA Lines: 111 In-reply-to: daniel@ziebmef.mef.org's message of 19 Sep 89 15:17:49 GMT In article <1989Sep19.111751.7613@ziebmef.mef.org> daniel@ziebmef.mef.org (Daniel Albano) writes: ;> The question of "hardware optimization" for APL machines ;> requires some consideration of how they will be used. Let's see. Selecting concepts out of your original posting, I am thinking of a multi-user machine, running "APLish code". Let's say something with the capacity of what currently costs around $50,000. Price should drop for that capability by the time I am anywhere near completion. I haven't decided how to deal with resource locking on multi-tasking conflicts. (Not that I don't have ideas. I just haven't convinced myself that they are that good.) The basic architechture I am trying to thrash out passes data objects around with a checklist of what needs to happen to them. There is a bandwidth penalty here, but it allows for quite a bit of extensibility (I hope). ;> . . . I tend to discount the overheads in handling small (several ;> element to scalar) data structures, because if that is all there ;> is, then the application is unlikely to be large, and there should ;> be lots of spare machine cycles. When the work grows, if the ;> application is well designed, so does the size of your data ;> entities. I am not convinced that this is true for all applications. Also there generally needs to be several design iterations before an application is "well designed". It is true that MANY (probably MOST) applications can be handled quite nicely using an APLish approach. ;> Making APL hardware could involve implementing a lot of APL ;> as "machine instructions", but that really means microcode - ;> . . . [much deleted] I think that you are talking about a conventional von neuman architecture (cpu with bus leading to memory in which is stored instructions and data). I am being a little more "blue sky" than that. (I can almost afford to: I'm a college student 8^) ;> The nature of the enhancements depend very much on what you ;> consider the key APL level operations to be. My own list of ;> crucial favourites would include reshape, dyadic iota, ;> compress, reduce, and generalized inner products (or at least ;> and.equals). A nice selection. ;> Operations on Boolean arrays would also be very high on the list, ;> as would comparisons, especially those for integer and character ;> data. Given a fairly decent architecture these SHOULD be fast. Look at the occurance of blitters in recent personal computers. (fast bit manipulation does nice things to performance where applicable). ;> . . . One operation implicit in any APL that is crucial is the ;> allocation of storage for data entities. . . . That is why I am trying for an architecture which is based on a communication model. There are penalties, but there should be big advantages. The question (a big one) is how to organize things. ;> On the other hand, I don't really care that much how fast ;> the trig functions or domino run, nor that much how fast ;> exponentiation and the like are. . . . I do. Trancendental functions and domino do nice things to applications which would other wise require massive looping. Could I get you to agree that base 2 logarithms should be fast? ;> The most crucial item in a lot of cases is just workspace size. Considering the current state of technology, I have been assuming a reasonably massive memory resource, though with varying levels of caching/mass-storage. ;> Personally, I don't think (single user machine) that you have ;> to provide multiple "control streams" or simultaneous processing ;> of user level tasks, but some parallelism in the execution of ;> array operations - or a high speed vector processing scheme - ;> could provide a major boost in performance. There are two kinds of parallelism I am considering. (1) User-level parallelism (coarse-grained) and (2) User-invisible (fine-grained). (1) is more of a software issue, and (2) is more of a hardware issue. ;> In many operations on conformal arrays, the (logical) structure of ;> the data is not important during the actual computation step. I don't quite follow you here. What are you saying? ;> [description of practical vector operation: skipping over parts of ;> memory] . . . ;> In real world terms, another major performance boost is a true ;> native APL file system - one that stores the components in their ;> internal representation. . . . The idea of making APL the operating ;> system, and giving it complete and direct control over the system ;> had a considerable appeal. The benefits could be enormous. exactly! I hope I am not being too foggy on details here. If anyone feels I should elaborate on any of the points here (assuming anyone is interested) don't hesitate to tell me. -- Raul --