Path: utzoo!attcan!uunet!mcsun!unido!fauern!tumuc!guug!ecrc!micha
From: micha@ecrc.de (Micha Meier)
Newsgroups: comp.lang.prolog
Subject: Prolog Efficiency and Benchmarking
Message-ID: <708@ecrc.de>
Date: 1 Aug 90 09:46:08 GMT
Sender: news@ecrc.de
Reply-To: micha@acrab4.UUCP (Micha Meier)
Organization: European Computer-Industry Research Centre
Lines: 165

Some people argue that Prolog is too slow, others claim  that  it  is
fast enough, yet others that it is even faster than languages like C,
for particular problems. I would like to contribute to  this  discus-
sion  from another point of view. I can say, and so can everybody who
has seriously tried to implement an efficient Prolog  compiler,  that
Prolog  could  be  made much faster than it is now, because there are
numerous redundant operations being performed all the time. Trying to
optimize them all away requires generating a lot of code to catch all
various possibilities and generate optimal code  for  each  of  them.
With  the  use of abstract interpretation it might be possible to ob-
tain enough information to generate both good and short code, but the
current  abstract interpretation techniques are not yet mature enough
to be available in commercial systems.

Consequently, people are trying to locate what are the bottlenecks of
Prolog execution and modify the compiler and the engine to get rid of
them. This, however, may only make the whole story more  complicated,
because  often  the criteria that decide what are the bottlenecks are
rather questionable. There have been many papers published which  say
something like "on the set of benchmarks B the feature F showed to be
very important and therefore we present a WAM modification that  han-
dles  the  feature F very efficiently".  In almost all cases the men-
tioned benchmark suite includes only very short programs,  which  can
in  no  way  be representative for current real-life Prolog programs.
*Current* is important here because in some  years  from  now,  these
programs  may become more representative because the way of executing
them may change, the current bottlenecks disappear, and the feature F
really  becomes significant. But until then, small benchmark programs
cannot be trusted! I have an interesting example with several  suites
of benchmark programs which were used to test the ECRC Sepia system.

The first one consists of the benchmark programs set up by  the  Com-
puter  Architecture  group at ECRC several years ago, and it tries to
test single Prolog actions, like creating a choice point, an environ-
ment,  dereferencing,  unification, etc.  The Pereira benchmarks used
for the same purpose are not well suited because they  often  measure
something different than they say.

The second benchmark set was also set up by  the  Computer  Architec-
ture, it includes small, but well known benchmark programs like naive
reverse, map colouring, sieve, quicksort, queens etc.

The third benchmark suite consists of large programs.
I have selected them using the following criteria:
        - The program must run long enough to be safely measurable
        - The program must contain a sufficient number of different
          procedure calls. 100 is sufficient.
        - The program has to make at least 100000 procedure calls
          and these calls must not be concentrated into one or
          two procedures only.
(SEPIA contains a profiler that shows the number of ports passed
for each procedure.)
These programs were taken from the L.A. benchmarking workshop,
some were written at ECRC and yet others come from various sources.

With these three benchmarks suites (I have also others, but they  are
rather  biased) one gets a reasonable feeling for the speed of a Pro-
log system. Now what is interesting is that I have run  these  bench-
marks  with  Quintus  2.0  and SEPIA 3.0.5 (academic release) and the
results are quite different:

                                time Quintus/SEPIA
        Single features                 0.79
        Small programs                  0.77
        Large programs                  1.33

The difference for the small benchmarks is mainly  due  to  the  fact
that  SEPIA  uses  32  bits for the tag and 32 for the value, and its
ability to handle coroutining and asynchronous interrupts.   For  the
large programs, however, one can see that actually other features may
outweigh.  I don't want to claim  here  that  SEPIA  is  faster  than
Quintus, there are enough programs even in the suite where it is 40 %
slower, but I want to claim that small benchmarks are not enough.

My point is that Prolog needs an official benchmark suite. We have at
least  to  start discussing which programs should or should not be in
the suite, and what are the criteria to select them. Currently, every
vendor  uses  some  benchmark suite, of course that one which is good
for his Prolog system, and other benchmarks are usually refused.   We
have  to  move  away  from this. I think that by now everyone can see
that Prolog systems are going to be much faster  and  that  a  vendor
will  not  be able to say "yes, on this benchmark the other Prolog is
faster, but only because it optimizes feature F which  in  real  pro-
grams  would not matter".  Sooner or later, all features will have to
be optimized.  I know it will be hard, but until we get  rid  of  the
"naive reverse curse", Prolog will not be taken seriously.

Last year in Cleveland I have talked to people from  several  commer-
cial  companies and it turned out that everyone basically agrees that
a benchmark suite is necessary, but everybody also wants to  partici-
pate  in  the  selection.  We more or less agreed that benchmark pro-
grams should be used 'as is'.  If a system has a way to generate e.g.
mode declarations automatically, it can use it, but declarations can-
not be added by hand.  Everybody can of course add some  declarations
and  say  "our  system runs the benchmark suite in X seconds but with
added mode declarations in Y seconds!".

There are always problems when running some programs  with  two  dif-
ferent Prolog systems, because the same functionality can be obtained
in different ways, and so e.g. using assert/1 to make a  counter  can
be  fast in one system, but hopelessly slow in another, which has its
own efficient primitive to handle counters.  These problems  have  to
be  handled  on  a case by case basis, the idea is that some programs
would have simple predicates that can be  defined  for  a  particular
Prolog system, e.g.  inc_counter(c) would be defined in one system as

inc_counter(C) :-
        erase(C, Value),
        NewVal is Value + 1,
        recorda(C, NewValue).

in another one as

inc_counter(C) :-
        ctr_inc(C).

or

inc_counter(C) :-
        incval(C).
This approach should make each vendor reasonably happy, and could be-
come  obsolete after the ISO standard appears (but probably will not,
because Prolog systems will continue to offer extra features to  han-
dle  frequent  operations).  The benchmark programs should be written
in a syntax which is not too distant from the current ISO  draft,  so
that they can be made ISO compatible when needed.

I would like to collect more programs from which the benchmark  suite
could  be selected.  Using our profiler, I can collect the statistics
for every program, both at the level of box ports and at the level of
WAM  instructions,  and I can send this statistics to everyone who is
interested.  I can also make my benchmark suites available to  people
who want to participate in the selection of benchmark programs.

Since I have already enough small programs (everybody has :-)),  what
I  need is more large programs, but also maybe programs tuned to test
other single features.  Please, if you have some  real-life  programs
which  are sufficiently large and which can be made public, send them
to me together with a sample data and reference output. If your  pro-
gram is good, you can in turn get SEPIA with discount, so it pays off
for you.  Moreover, if your program is in the benchmark suite, then a
lot  of  implementors all around the world will be trying to speed up
*your* program, wouldn't that be great?  The best media is a Sun car-
tridge, but mailing a compressed file would do as well. Everybody who
contributes with a program will be acknowledged in the suite.

--Micha Meier

Disclaimer as usual.

USA     micha%ecrc.de@pyramid.com
	pyramid!ecrc!micha

EUROPE  micha@ecrc.de
	unido!ecrc!micha

MAIL	Micha Meier
	ECRC
	Arabellastr. 17
	8000 Munich 81
	West Germany
--
USA     micha%ecrc.de@pyramid.com, pyramid!ecrc!micha

EUROPE  micha@ecrc.de, unido!ecrc!micha