Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!sol.ctr.columbia.edu!emory!gatech!purdue!haven!ni.umd.edu!uc780.umd.edu!cs450a03
From: cs450a03@uc780.umd.edu
Newsgroups: comp.lang.apl
Subject: implementing @:
Message-ID: <19APR91.00433381@uc780.umd.edu>
Date: 19 Apr 91 00:43:33 GMT
Sender: usenet@ni.umd.edu (USENET News System)
Organization: The University of Maryland University College
Lines: 70

There are various parts of J which are rather slow.  I've been looking
at implementing J myself, and I think I can see a lot about the
implementation from what's slow and what's fast.

Essentially, it looks like J is written in a highly self-referent
fashion.  This is execellent for generality, and is even better for
making the language consistent with itself.  There is a speed price
though.

One way to improve speed would be to selectively substitute faster
expressions for overly general expressions.  I haven't explored this
idea much, yet.

Another way to improve speed is to re-do the underlying
implementation.  For example, if you consider J to be a generalized
array language, then one step back would be a generalized vector
language.  Maybe implementing J in terms of only scalar and vector
routines would be a win.  (Presumably, one would use J instructions
limited to rank 1...)

Finally, you could go whole-hog and implement every J function as a
"leaf" function in the implementation language (or, call system
routines, but nothing else).  This might increase code-size quite a
bit, but might run really fast.

With that out of the way, I'd like to bring up a tiny algorithm.
Consider  @: v
where v is a numeric vector (not boxed)

I think that what J is doing for this case is building a table of all
permutations of i.#v, and using a transitive i. to find the row which
matches v.  This is nice and concise, but seems slow in the current
incarnation of J.

It's occurred to me that on a typical (scalar) machine, there is a
simple O(n squared) algorithm that should run pretty fast:

(1) find length of v (if 12 or less, assume an integer result, if 170
or less, assume a floating point result -- quite possibly will
overflow if greater than 170)

(2) build a vector of i.#v (call it N), and another vector of 0#~#v
(call it M).

(3) loop over elements of v (left to right, call loop index j)
       if j{N is positive, make j{M equal to j{N
                           then make j{N be _1
                           then do N =. N - j < i.#N
       if j{N is negative, abort -- result of @: is !#v

(4) result of @: is +/M*!|.i.#M

--------------------

Note that this algorithm would be pretty fast if M could be
constructed in one pass (rather than having to re-build it at every
iteration through loop(3)).

I suppose that some attention should be paid to "intellegently"
pre-allocating memory for results.  Perhaps if catenate could somehow
be told to allocate extra memory for future catenates?  (this would
have to be fairly transparent to the user)

Has anyone thought about this sort of condition?

I think J is losing a lot of time, currently, in catenation.

(Of course, I may be wrong)

Raul Rockwell