Path: utzoo!attcan!uunet!lll-winken!lll-tis!mordor!joyce!zodiac!ames!mailrus!cornell!uw-beaver!uoregon!will
From: will@uoregon.uoregon.edu (William Clinger)
Newsgroups: comp.lang.lisp
Subject: Re: Summing a list
Message-ID: <3065@uoregon.uoregon.edu>
Date: 31 Oct 88 21:45:43 GMT
References: <10794@srcsip.UUCP> <10813@srcsip.UUCP> <6135@csli.STANFORD.EDU> <29718@think.UUCP> <6143@csli.STANFORD.EDU>
Reply-To: will@fog.UUCP (William Clinger)
Organization: University of Oregon, Computer Science, Eugene OR
Lines: 176

My apologies for continuing this discussion, but I grew tired of seeing
timings without analysis and I thought I might as well prepare a lecture
for this weird "Programming in Lisp" course I have to teach...
 
    I...obtained the following results under Allegro CL on a Mac II.
    Interestingly, the non-tail recursive version (as originally
    provided) is faster than the tail-recursive one!  (Or have I
    made an error in the coding?)
 
    Mark Johnson

Putting Mark's data with my measurements on a Macintosh II shows some of
the dangers of generalizing from one implementation:

                 Common Lisp    Scheme     comment

    NOTHING           .00          .02     no-op
    DO-LOOP          1.85         6.92     loop using an assignment
    D0-LOOP1         2.85 *       1.07     loop using do
    TRECURSIVE       5.95         1.08     tail-recursive, local procedure
    TRECURSIVE1      3.67         1.05     tail-recursive, global procedure
    RECURSIVE        5.25         3.40     non-tr, local procedure
    RECURSIVE1       2.83         3.72     non-tr, global procedure
    REDUCE-+         6.25        15.97     (reduce + list1000)
    APPLY-+           .92         ---      (apply + list1000)
    EVAL-+          12.32      6106.78     (eval (cons '+ list1000))

Considering Mark's question first, I too would have expected the
tail-recursive version to be faster than the non-tail-recursive version,
as it was in Scheme.  Part of the answer, I think, is that Allegro CL
does a pretty good job with non-tail-recursion but doesn't compile tail
recursion as efficiently as it might.  Another thing to consider is that
the tail-recursive version has an extra argument to deal with.  Evidently
the overhead of passing that extra argument is about the same as the
extra cost of non-tail-recursion compared to tail recursion in Allegro CL.
MacScheme, on the other hand, does a poor job with non-tail-recursion so
the extra argument is insignificant by comparison.

While some might consider APPLY the most elegant solution, and it appears
to be the fastest in Allegro CL, it isn't a portable solution.
Implementations of Common Lisp are explicitly allowed to place limits on
the number of arguments passed to a procedure, and the limit may be as
small as 50.  The Scheme language doesn't say anything about this, but
MacScheme has such a limit anyway (about 250).

Some Common Lisps compile tail-recursion as though it were
non-tail-recursion, so TRECURSIVE and TRECURSIVE1 are not portable
either in Common Lisp if you may have to deal with long lists.

EVAL is a pretty clear loser, but why did MacScheme take almost two hours?
Most implementations of Common Lisp use an interpretive EVAL, but Scheme
systems often use a compiling EVAL.  Furthermore the MacScheme compiler
has a minor bug that makes the worst-case compile time quadratic, rather
than linear.  What's the bug?  "If we're compiling an expression that looks
like (+ E1 E2 ...), and its LENGTH is greater than 2, then change it to
(+ (+ E1 E2) ...) and try again."  In other words, EVAL-+.

In Scheme you can pretty well count on DO-LOOP1, TRECURSIVE, and
TRECURSIVE1 compiling into nearly identical code because they usually
are expanded into nearly identical code before the compiler proper ever
sees them.  TRECURSIVE1 might be slower because fetching the global
value of TRECURSIVE1 for the tail-recursive call may involve an actual
variable reference, while the local variable references for the other
two would certainly be optimized away.  It seems like the same should be
true of Common Lisp, but in fact many CL compilers treat do loops and
top-level procedures specially, leaving locally defined procedures to
the mercy of the compiler's most general algorithms.  With such a compiler
I would expect DO-LOOP1 to be fastest and TRECURSIVE slowest, which is
what we observe.  This probably explains also why RECURSIVE was slower
than RECURSIVE1 in Common Lisp.

Using REDUCE for this problem means calling the general version of +,
which I would expect to be slow because it involves a rest argument.
It might be slow for other reasons also, depending on how REDUCE is
coded.  Common Lisp was only twice as slow for REDUCE-+ as for RECURSIVE1,
which isn't too bad.

I wrote REDUCE for Scheme, using a subset of the CL semantics.  A curried
REDUCE (e.g. ((REDUCE +) LIST1000) instead of (REDUCE + LIST1000)) would
have been more in the spirit of Scheme and would not have made any
significant difference to the timings.  In my opinion a curried REDUCE
supplies the most elegant solution to the problem of summing a list:
(DEFINE MY-LIST-SUMMATION-PROCEDURE (REDUCE +)).  It's also the slowest
of the reasonable solutions.  The fastest portable solution is
tail-recursion for Scheme and some kind of DO loop for Common Lisp.

The obvious explanation for the DO-LOOP anomaly is that the Scheme version
uses a procedure, FOR-EACH, while the CL version uses a macro, DOLIST,
causing the Scheme version to execute 100,000 full-scale procedure calls
compared with none for the CL version.  I wrote DO-LOOP1 as a similar
benchmark that could be translated more easily between the two languages.
The starred Common Lisp timing was mine, not Mark's, and thus may not be
directly comparable to the others in the CL column.

Peace,
William Clinger
an author of MacScheme

Scheme code (assuming a timeit macro):

(define (sum-test list . rest)
  (let ((count (if (null? rest) 100 (car rest))))
    (define (repeat thunk)
      (define (loop n)
        (if (zero? n)
            'done
            (begin (thunk) (loop (- n 1)))))
      (loop count))
    (define (my-false . x) #f)
    (define (nothing)
      (repeat (lambda () (my-false list))))
    (define (do-loop)
      (repeat (lambda ()
                (let ((sum 0))
                  (for-each (lambda (item) (set! sum (+ sum item)))
                            list)
                  sum))))
    (define (do-loop1)
      (repeat (lambda ()
                (do ((sum 0 (+ sum (car list)))
                     (list list (cdr list)))
                    ((null? list) sum)))))
    (define (recursive)
      (define (recursive-call list)
        (if (null? list)
            0
            (+ (car list) (recursive-call (cdr list)))))
      (repeat (lambda () (recursive-call list))))
    (define (trecursive)
      (define (trecursive-call list sum)
        (if (null? list)
            sum
            (trecursive-call (cdr list) (+ (car list) sum))))
      (repeat (lambda () (trecursive-call list 0))))
    (define (recursive1)
      (repeat (lambda () (recursive1-call list))))
    (define (trecursive1)
      (repeat (lambda () (trecursive1-call list 0))))
    (define (eval-+) (repeat (lambda () (eval (cons '+ list)))))
    (define (apply-+) (repeat (lambda () (apply + list))))
    (define (reduce-+) (repeat (lambda () (reduce + list))))
    (timeit (nothing))
    (timeit (do-loop))
    (timeit (do-loop1))
    (timeit (recursive))
    (timeit (trecursive))
    (timeit (recursive1))
    (timeit (trecursive1))
    (timeit (reduce-+))
    ;(timeit (apply-+))
    (timeit (eval-+))))
 
(define (recursive1-call list)
  (if (null? list)
      0
      (+ (car list) (recursive1-call (cdr list)))))
 
(define (trecursive1-call list sum)
  (if (null? list)
      sum
      (trecursive1-call (cdr list) (+ (car list) sum))))

(define (reduce f l)
  (define (loop x l)
    (if (null? l)
        x
        (loop (f x (car l)) (cdr l))))
  (loop (car l) (cdr l)))

(define (iota n)
  (do ((n n (1- n))
       (l '() (cons n l)))
      ((zero? n) l)))

(define x (iota 1000))

(sum-test x)