Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!sol.ctr.columbia.edu!emory!gatech!purdue!haven!ni.umd.edu!uc780.umd.edu!cs450a03 From: cs450a03@uc780.umd.edu Newsgroups: comp.lang.apl Subject: implementing @: Message-ID: <19APR91.00433381@uc780.umd.edu> Date: 19 Apr 91 00:43:33 GMT Sender: usenet@ni.umd.edu (USENET News System) Organization: The University of Maryland University College Lines: 70 There are various parts of J which are rather slow. I've been looking at implementing J myself, and I think I can see a lot about the implementation from what's slow and what's fast. Essentially, it looks like J is written in a highly self-referent fashion. This is execellent for generality, and is even better for making the language consistent with itself. There is a speed price though. One way to improve speed would be to selectively substitute faster expressions for overly general expressions. I haven't explored this idea much, yet. Another way to improve speed is to re-do the underlying implementation. For example, if you consider J to be a generalized array language, then one step back would be a generalized vector language. Maybe implementing J in terms of only scalar and vector routines would be a win. (Presumably, one would use J instructions limited to rank 1...) Finally, you could go whole-hog and implement every J function as a "leaf" function in the implementation language (or, call system routines, but nothing else). This might increase code-size quite a bit, but might run really fast. With that out of the way, I'd like to bring up a tiny algorithm. Consider @: v where v is a numeric vector (not boxed) I think that what J is doing for this case is building a table of all permutations of i.#v, and using a transitive i. to find the row which matches v. This is nice and concise, but seems slow in the current incarnation of J. It's occurred to me that on a typical (scalar) machine, there is a simple O(n squared) algorithm that should run pretty fast: (1) find length of v (if 12 or less, assume an integer result, if 170 or less, assume a floating point result -- quite possibly will overflow if greater than 170) (2) build a vector of i.#v (call it N), and another vector of 0#~#v (call it M). (3) loop over elements of v (left to right, call loop index j) if j{N is positive, make j{M equal to j{N then make j{N be _1 then do N =. N - j < i.#N if j{N is negative, abort -- result of @: is !#v (4) result of @: is +/M*!|.i.#M -------------------- Note that this algorithm would be pretty fast if M could be constructed in one pass (rather than having to re-build it at every iteration through loop(3)). I suppose that some attention should be paid to "intellegently" pre-allocating memory for results. Perhaps if catenate could somehow be told to allocate extra memory for future catenates? (this would have to be fairly transparent to the user) Has anyone thought about this sort of condition? I think J is losing a lot of time, currently, in catenation. (Of course, I may be wrong) Raul Rockwell