Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!iuvax!uxc.cso.uiuc.edu!uxc.cso.uiuc.edu!ux1.cso.uiuc.edu!uicsrd.csrd.uiuc.edu!jaxon
From: jaxon@uicsrd.csrd.uiuc.edu
Newsgroups: comp.lang.apl
Subject: Re: APL Machines
Message-ID: <49700014@uicsrd.csrd.uiuc.edu>
Date: 11 Sep 89 16:26:00 GMT
References: <153557@<1989Sep5>
Lines: 79
Nf-ID: #R:<1989Sep5:153557:uicsrd.csrd.uiuc.edu:49700014:000:3998
Nf-From: uicsrd.csrd.uiuc.edu!jaxon    Sep 11 11:26:00 1989


> There was an APL machine (by that name) produced by a corporation
> somewhere on the east coast of the US.  It was based on 68000 
> processors,...

You're thinking of Analogic Corporation's 68000/CATscanner hybrid.  The
68000s were just control processors for the interpreter, most primitives
were microcoded on a 12-bit x 8 element Array Processor which had been
scavenged from one of Analogic's medical imaging instruments.  I believe
it's still on the market, although no plans exist to upgrade it.  The
interpreter and user interface are excellent.   The CAT scanner part is
a merciless number cruncher, so fast that you won't notice the effects
of long vector lengths until well past 1000 elements.  (i.e. it is several
hundred times faster than a 68000).  

There is a lesson in the APL Machine design, though.  The array processor
is not enough!  Despite the implementer's heavy use of the AP (e.g. linear
search on the AP always outperformed hash table lookup on the 68000),  the
68000 was a constant damper on program speed.  Memory management of small
vectors and scalars is not an especially parallelizable aspect of APL. 
The uniprocessors responsible for serial sections of the interpreter, and
whatever features are used to synchronize the parallel sections are absolutely
critical elements in an APL supercomputer.

> The "equals" primitive could be written in APL : ...[buggy code omitted]

Several interpreters have tried using "magic functions" to produce new 
primitives from old.   It is never as easy as it looks, and it is NEVER fast -
it's not even tolerably slow.   

1) Your definition is wrong -- you must take absolute values before comparing
   the arguments.

2) Once the correct definition is written, you must make it work even when
   the intermediate terms exceed the number system's limits.

3) By now you've got a function that works for simple scalars. To call it
   you'll have to create two scalar APL objects, and a stack frame (that's
   invisible in the caller's ")SI").  You'll have to make a class of APL
   function capable of returning into a primitive algorithm at the correct
   place.  

This does not compare favorably with a single instruction for Tolerant Equals.

I'm not a great fan of #CT and its consequences, but it is STANDARD and heavily
relied upon, and it is one more language-specific hardware feature that APL
could really use.


> "Dictionary APL"  NOT the "APL2" dialect.

Firstly I'd urge any APL designer to become deeply familiar with BOTH these
language definitions, and to really use BOTH systems.  The dictionary approach
("function rank") is really a wonderful perfection of the original APL array
processing ideas.  I suspect it is more efficient to implement, because it is
a little less powerful than the equivalent features in APL2.  In "Dictionary"
APL, operators are much more powerful.  If operators can really manipulate
the functions (e.g. pipelining them, carrying temporary results in registers,
etc.)  then I'd say the dictionary approach is best suited to today's 
vector supercomputers.   

But "function rank" seems tied to homogeneous arrays (am I wrong here?)

There are real limits on programmers' ability to forsee what a function
expression will do.  In the APL2 approach, all the data decompositions are
explicitly written out, you can enter the subexpressions and watch what's
happening to your data.  You can also do all kinds of unorthodox decompositions
of your data, which stand no chance of being vectorizable.

And "vectorizing" is not the only hope for APL anyway!  Multiple instruction
Multiple Data parallel machines are growing in number and power, these don't
require that "one function" be in control, that "two arguments" be in memory
and that "one type" of result is expected.  I think the APL2 approach will
provide very rich ground for parallel machine designers.

Thanks in advance for any replies!

greg jaxon -- jaxon@uicsrd.csrd.uiuc.edu
Univ. of Ill. Center for Supercomputing R&D