Path: utzoo!attcan!uunet!seismo!ukma!tut.cis.ohio-state.edu!brutus.cs.uiuc.edu!apple!usc!merlin.usc.edu!usc.edu!raulmill
From: raulmill@usc.edu (Raul Deluth Rockwell)
Newsgroups: comp.lang.apl
Subject: Re: APL Machines
Summary: Raul's follow up to Daniel's good article.
Message-ID: <RAULMILL.89Sep21141252@usc.edu>
Date: 21 Sep 89 21:12:52 GMT
References: <153557@<1989Sep5> <49700014@uicsrd.csrd.uiuc.edu>
	<22186@cup.portal.com> <1989Sep19.111751.7613@ziebmef.mef.org>
Sender: news@merlin.usc.edu
Organization: University of Southern California, Los Angeles, CA
Lines: 111
In-reply-to: daniel@ziebmef.mef.org's message of 19 Sep 89 15:17:49 GMT

In article <1989Sep19.111751.7613@ziebmef.mef.org>
daniel@ziebmef.mef.org (Daniel Albano) writes:
;> The question of "hardware optimization" for APL machines 
;> requires some consideration of how they will be used.

Let's see.  Selecting concepts out of your original posting, I am
thinking of a multi-user machine, running "APLish code".  Let's say
something with the capacity of what currently costs around $50,000.
Price should drop for that capability by the time I am anywhere near
completion.

I haven't decided how to deal with resource locking on multi-tasking
conflicts.  (Not that I don't have ideas.  I just haven't convinced
myself that they are that good.)

The basic architechture I am trying to thrash out passes data objects
around with a checklist of what needs to happen to them.  There is a
bandwidth penalty here, but it allows for quite a bit of extensibility
(I hope).

;> . . .  I tend to discount the overheads in handling small (several
;> element to scalar) data structures, because if that is all there
;> is, then the application is unlikely to be large, and there should
;> be lots of spare machine cycles.  When the work grows, if the
;> application is well designed, so does the size of your data
;> entities.

I am not convinced that this is true for all applications.  Also there
generally needs to be several design iterations before an application
is "well designed".   It is true that MANY (probably MOST)
applications can be handled quite nicely using an APLish approach.  

;> Making APL hardware could involve implementing a lot of APL
;> as "machine instructions", but that really means microcode -
;> . . . [much deleted]

I think that you are talking about a conventional von neuman
architecture (cpu with bus leading to memory in which is stored
instructions and data).  I am being a little more "blue sky" than
that.  (I can almost afford to:  I'm a college student 8^)

;> The nature of the enhancements depend very much on what you
;> consider the key APL level operations to be.  My own list of 
;> crucial favourites would include reshape, dyadic iota, 
;> compress, reduce, and generalized inner products (or at least
;> and.equals).

A nice selection.

;> Operations on Boolean arrays would also be very high on the list,
;> as would comparisons, especially those for integer and character
;> data.

Given a fairly decent architecture these SHOULD be fast.  Look at the
occurance of blitters in recent personal computers.  (fast bit
manipulation does nice things to performance where applicable).

;> . . .  One operation implicit in any APL that is crucial is the
;> allocation of storage for data entities.  . . .

That is why I am trying for an architecture which is based on a
communication model.  There are penalties, but there should be big
advantages.  The question (a big one) is how to organize things.

;> On the other hand, I don't really care that much how fast 
;> the trig functions or domino run, nor that much how fast 
;> exponentiation and the like are.  . . .

I do.  Trancendental functions and domino do nice things to
applications which would other wise require massive looping.  Could I
get you to agree that base 2 logarithms should be fast?

;> The most crucial item in a lot of cases is just workspace size.

Considering the current state of technology, I have been assuming a
reasonably massive memory resource, though with varying levels of
caching/mass-storage.

;> Personally, I don't think (single user machine) that you have
;> to provide multiple "control streams" or simultaneous processing
;> of user level tasks, but some parallelism in the execution of
;> array operations - or a high speed vector processing scheme -
;> could provide a major boost in performance.

There are two kinds of parallelism I am considering.  (1) User-level
parallelism (coarse-grained) and (2) User-invisible (fine-grained).
(1) is more of a software issue, and (2) is more of a hardware issue.

;> In many operations on conformal arrays, the (logical) structure of
;> the data is not important during the actual computation step.

I don't quite follow you here.  What are you saying?

;> [description of practical vector operation: skipping over parts of
;> memory] . . .

;> In real world terms, another major performance boost is a true
;> native APL file system - one that stores the components in their
;> internal representation.  . . .  The idea of making APL the operating
;> system, and giving it complete and direct control over the system
;> had a considerable appeal.  The benefits could be enormous.

exactly!

I hope I am not being too foggy on details here.  If anyone feels I
should elaborate on any of the points here (assuming anyone is
interested) don't hesitate to tell me.

--
Raul
--