Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!watmath!clyde!burl!ulysses!mhuxr!mhuxn!ihnp4!qantel!lll-lcc!lll-crg!seismo!mcvax!unido!ecrcvax!jclaude
From: jclaude@ecrcvax.UUCP (Jean Claude Syre)
Newsgroups: net.lang.prolog
Subject: Benchmarking Prolog Systems (Version 1)
Message-ID: <226@ecrcvax.UUCP>
Date: Mon, 28-Apr-86 16:18:59 EDT
Article-I.D.: ecrcvax.226
Posted: Mon Apr 28 16:18:59 1986
Date-Received: Fri, 2-May-86 09:46:34 EDT
Organization: ECRC, D-8000 Muenchen 81, W. Germany
Lines: 585


                       A Proposal for
         ***********************************************
         *** BENCHMARK PROGRAMS FOR PROLOG SYSTEMS   ***
         ***********************************************
                       Part 1 (of 3)
J.C. SYRE
ECRC (European Computer-industry Research Center)
Arabellastr. 17
D-8000 MUNICH 81
WEST GERMANY


This set of benchmark programs is a collective work done
by the Logic Programming Group and the Computer Architecture
Group of ECRC, the European Computer-Industry Research Centre,
in Munich.
For convenience, you can send messages to Hans Benker
(replace "jclaude" by "hans" in this net address), or
myself.
The designers of the benchmark programs are: H. Benker, J. Noye,
S. Schmitz, J.C. Syre, and ****many others**** from ECRC.

The first section deals with simple programs whose single
purpose is to evaluate a single feature of prolog execution.
The times we give correspond to an interpretation using Cprolog
on a VAX11/785 under UNIX BSD4.2, in a "quiet" environment
(which may be obtained on a sunny Sunday with all other logins
prohibited). They are subject to a 10 percent inaccuracy due to
paging and timecounts.

The second section presents more complex programs, which may
still run on small Prolog systems. We are open to suggestions
to improve the significance of the benchmark. The programs
include many of the programs taken by the University of
Berkeley for their evaluation of the PLM1 prolog machine. We
are looking for more programs dealing with other domains where
Prolog may be appropriate (natural language, data bases,
reasoning, etc.). Proposals can be sent to me, but do not
forget that the programs must be kept short at exec time.

There should be a third section, that we have not fully built yet,
and we count on you to build it. Those
programs should be representative of real scale prototypes of
large Prolog applications. CHAT80 is an example of such
programs for this section. If you feel embarassed to propagate
one of your programs having properties that you are reluctant to
make public, you may think of modifying it or truncate it so
that it becomes hardly readable, or useless for actual use by
others.

So, here is the first version. We do not pretend at any
originality, we expect to have a large feedback from you,
either to improve this version (through comments, new proposals
or new programs), or to have your evaluation results on your
system. I will report on the next versions and your feelings in
the near future.  Have fun !


1. Simple Benchmark programs.

These simple (or simplistic) programs aim at evaluating a
single feature of the Prolog System. Here a Prolog System is
understood as either a pair <Prolog software, Host machine>,
where the Prolog software is an interpreter, a compiler, a
combination of both, and the Host machine is a conventional
machine (with its operating system and workload),
a simulator of a prolog processor, or a real piece of
Prolog hardware (direct interpreter, or PLM processor, or
anything else).

The "single feature" mentioned above means that the performance
results will show how well the Prolog System can handle a
particular characteristic of the language.

The phenomena we measure are:

   o   calls
   o   non-determinism
   o   handling of environments
   o   unification
   o   indexing

There are many more which would be interesting to measure, e.g. efficiency
of built-ins, "assert" and "retract", I/O and tail recursion optimisation.
However for now the above 5 criteria seemed to be the most interesting and
maybe somebody on the net can design benchmarks for the remaining features.

Measuring a single feature of a language is difficult. One single execution
of a tiny program testing a particular feature takes not enough time to
measure it precisely. To get a better precision one has to execute the
test program hundreds of times. There are two ways to do this: Write down
the test program as often as one wants to execute it or include it in a
loop. The first solution implies that one has to write programs with
hundreds of lines of code, where each line does the same job.
This is not convenient and it is desirable to use loops. In the case of
our benchmark programs however, the time spent executing the loop is not
negligible, due to the very small size of our test programs. Therefore we
used a combination of both methods, i.e. sequences of repeated code
surrounded with a loop. In order to minimise the effect of the loop, we
actually run as well an "empty" loop, without the benchmark program. We call
this "compensation loop" and subtract its execution time  from the execution
time of the loop including the benchmark program. This increases of course
the relative error on the time measurement, but we have decreased the
influence of the unavoidable loop.

The repeated code can be generated by your favorite editor. How much
repeated code you need to get a sufficient precision of course depends on
the implementation of your particular Prolog system. However we put as much
repeated code into each benchmark program as we think is apropriate to most
Prolog implementations. So we think you should get sufficient precision
without modifying our programs in that respect.

The listings of the programs follow below. For each simple program we try to
give the characteristics of it and some remarks about what it measures.

Note that "cputime" in C-Prolog on the VAX gives you the possibility to
measure runtime. This may be different in other Prolog systems. All the
rest of the programs should be portable to any other system without any
problem.


1.1. Program to test calls  (boresea).

This is the one you always dreamed to test! Like all benchmarks, it uses a
loop calling the actual benchmark program. The benchmark program consists of
a sequence of 200 predicates having no arguments, no choice points, NOTHING.
200 is chosen to have sufficient accuracy in measuring the execution time.

The results show the effect of pure calls, and the Klips performance can be
called the peak performance of the prolog system. Note that the peak
performance has very little significance to classify the overall performance
of a Prolog system.

---------------- cut here - beginning of program listing ---------------------

/* This program is called with the query "?-boresea(X)."             */
/* X is the number of loop iterations executed. It should be big     */
/* enough to give significant results.                               */
/* suggested value for X: 100 for interpreted code*/
/*                       1000 for compiled code   */
/* average values for C-prolog interpreter:       */
/*       X=1000, Tloop=27.1 T.comp=1.0 Tnet=26.1 Klips=7.7 */

boresea(X)
     :- T1 is cputime,
        do_max_KLips(X),           /* calls the loop to execute the  */
        T2 is cputime,             /* sequence of 200 predicates     */
        compens_loop(X),           /* compensation loop              */
        T3 is cputime,
        print_times(T1,T2,T3,X,200). /* compute and print results    */


compens_loop(0).                    /* compensation loop             */
compens_loop(X) :- Y is X - 1, compens_loop(Y).

print_times(T1,T2,T3,X,I) :-        /* prints the results            */
        TT1 is T2 - T1,
        TT2 is T3 - T2,
        TT is TT1 - TT2,
        write('T overall loop:   '),write(TT1), nl,
        write('T compens loop:   '),write(TT2), nl,
        write('T net:            '),write(TT),nl,
        write('KLips:            '),
        Li is I * X,
        Lips is Li / TT,
        KLips is Lips / 1000,
        write(KLips),nl,nl.

do_max_KLips(0).                  /* loop calling the actual benchmark */
do_max_KLips(X) :- lips1, Y is X - 1, do_max_KLips(Y).

/* predicates to test call */

lips1 :- lips2.
lips2 :- lips3.
lips3 :- lips4.
lips4 :- lips5.
lips5 :- lips6.
lips6 :- lips7.
lips7 :- lips8.
lips8 :- lips9.
lips9 :- lips10.
lips10 :- lips11.
lips11 :- lips12.
lips12 :- lips13.
lips13 :- lips14.
lips14 :- lips15.
lips15 :- lips16.
lips16 :- lips17.
lips17 :- lips18.
lips18 :- lips19.
lips19 :- lips20.
lips20 :- lips21.
lips21 :- lips22.
lips22 :- lips23.
lips23 :- lips24.
lips24 :- lips25.
lips25 :- lips26.
lips26 :- lips27.
lips27 :- lips28.
lips28 :- lips29.
lips29 :- lips30.
lips30 :- lips31.
lips31 :- lips32.
lips32 :- lips33.
lips33 :- lips34.
lips34 :- lips35.
lips35 :- lips36.
lips36 :- lips37.
lips37 :- lips38.
lips38 :- lips39.
lips39 :- lips40.
lips40 :- lips41.
lips41 :- lips42.
lips42 :- lips43.
lips43 :- lips44.
lips44 :- lips45.
lips45 :- lips46.
lips46 :- lips47.
lips47 :- lips48.
lips48 :- lips49.
lips49 :- lips50.
lips50 :- lips51.
lips51 :- lips52.
lips52 :- lips53.
lips53 :- lips54.
lips54 :- lips55.
lips55 :- lips56.
lips56 :- lips57.
lips57 :- lips58.
lips58 :- lips59.
lips59 :- lips60.
lips60 :- lips61.
lips61 :- lips62.
lips62 :- lips63.
lips63 :- lips64.
lips64 :- lips65.
lips65 :- lips66.
lips66 :- lips67.
lips67 :- lips68.
lips68 :- lips69.
lips69 :- lips70.
lips70 :- lips71.
lips71 :- lips72.
lips72 :- lips73.
lips73 :- lips74.
lips74 :- lips75.
lips75 :- lips76.
lips76 :- lips77.
lips77 :- lips78.
lips78 :- lips79.
lips79 :- lips80.
lips80 :- lips81.
lips81 :- lips82.
lips82 :- lips83.
lips83 :- lips84.
lips84 :- lips85.
lips85 :- lips86.
lips86 :- lips87.
lips87 :- lips88.
lips88 :- lips89.
lips89 :- lips90.
lips90 :- lips91.
lips91 :- lips92.
lips92 :- lips93.
lips93 :- lips94.
lips94 :- lips95.
lips95 :- lips96.
lips96 :- lips97.
lips97 :- lips98.
lips98 :- lips99.
lips99 :- lips100.
lips100:- lips101.
lips101 :- lips102.
lips102 :- lips103.
lips103 :- lips104.
lips104 :- lips105.
lips105 :- lips106.
lips106 :- lips107.
lips107 :- lips108.
lips108 :- lips109.
lips109 :- lips110.
lips110 :- lips111.
lips111 :- lips112.
lips112 :- lips113.
lips113 :- lips114.
lips114 :- lips115.
lips115 :- lips116.
lips116 :- lips117.
lips117 :- lips118.
lips118 :- lips119.
lips119 :- lips120.
lips120 :- lips121.
lips121 :- lips122.
lips122 :- lips123.
lips123 :- lips124.
lips124 :- lips125.
lips125 :- lips126.
lips126 :- lips127.
lips127 :- lips128.
lips128 :- lips129.
lips129 :- lips130.
lips130 :- lips131.
lips131 :- lips132.
lips132 :- lips133.
lips133 :- lips134.
lips134 :- lips135.
lips135 :- lips136.
lips136 :- lips137.
lips137 :- lips138.
lips138 :- lips139.
lips139 :- lips140.
lips140 :- lips141.
lips141 :- lips142.
lips142 :- lips143.
lips143 :- lips144.
lips144 :- lips145.
lips145 :- lips146.
lips146 :- lips147.
lips147 :- lips148.
lips148 :- lips149.
lips149 :- lips150.
lips150 :- lips151.
lips151 :- lips152.
lips152 :- lips153.
lips153 :- lips154.
lips154 :- lips155.
lips155 :- lips156.
lips156 :- lips157.
lips157 :- lips158.
lips158 :- lips159.
lips159 :- lips160.
lips160 :- lips161.
lips161 :- lips162.
lips162 :- lips163.
lips163 :- lips164.
lips164 :- lips165.
lips165 :- lips166.
lips166 :- lips167.
lips167 :- lips168.
lips168 :- lips169.
lips169 :- lips170.
lips170 :- lips171.
lips171 :- lips172.
lips172 :- lips173.
lips173 :- lips174.
lips174 :- lips175.
lips175 :- lips176.
lips176 :- lips177.
lips177 :- lips178.
lips178 :- lips179.
lips179 :- lips180.
lips180 :- lips181.
lips181 :- lips182.
lips182 :- lips183.
lips183 :- lips184.
lips184 :- lips185.
lips185 :- lips186.
lips186 :- lips187.
lips187 :- lips188.
lips188 :- lips189.
lips189 :- lips190.
lips190 :- lips191.
lips191 :- lips192.
lips192 :- lips193.
lips193 :- lips194.
lips194 :- lips195.
lips195 :- lips196.
lips196 :- lips197.
lips197 :- lips198.
lips198 :- lips199.
lips199 :- lips200.
lips200.

--------------------cut here - end of program listing-------------------------


1.2. Program to test non deterministic behaviour

This program contains a series of 3 different benchmark predicates.

The predicate "choice_point(N)" tests calls invoking the creation of a
choice point, i.e. a branch point where the execution will possibly come
back to in case of backtracking. It does NOT backtrack.

We then present two predicates to evaluate the mechanism of backtracking
during execution. Both predicates create one choice_point and then backtrack
20 times on every loop iteration step. "baktrak1(N)" exhibits a kind of
backtracking called "deep", while "baktrak2(N)" deals with "shallow"
backtracking. Both are worth being tried, whatever your particular
Prolog System is.

----------------------cut here - beginning of program listing----------------


/* program to benchmark non deterministic behaviour of Prolog Systems  */

/* The predicates are called:                                          */

/*                 o  "choice_point(N)" - creation of choice points    */
/*                 o  "baktrak1(N)"     - deep backtracking            */
/*                 o  "baktrak2(N)"     - shallow backtracking         */

/*  N is the number of loop iterations executed                        */


/* predicate to test creation of choice points without backtracking    */
/* suggested value for N: 1000 */
/* results for  Cprolog N=1000 */
/* Tloop=5.95 Tcompens=0.98 Tnet=4.97 Klips=4.02 */

choice_point(N):-T1 is cputime,
        cre_CP(N), T2 is cputime,
        compens_loop(N), T3 is cputime,
        print_times(T1,T2,T3,N,20).


/* Predicate to test the (deep) backtracking mechanism. */
/* suggested value for N: 1000 (interp), 2000(comp) */
/* results for Cprolog: N=1000  */
/* Tloop=9.63 Tcomp=1 Tnet=8.63 Klips=2.32  */

baktrak1(N)
     :- T1 is cputime,
        deep_back(N),
        T2 is cputime,
        compens_loop(N),
        T3 is cputime,
        print_times(T1,T2,T3,N,20).


/* Predicate to test the (shallow) backtracking mechanism */
/* suggested value for N: 1000 (interp), 2000 (comp) */
/* results for Cprolog: N=1000  */
/* Tloop=6.63  Tcomp=0.97 Tnet=5.67 Klips=3.53  */

baktrak2(X)
     :- T1 is cputime,
        shallow_back(X), T2 is cputime,
        compens_loop(X), T3 is cputime,
        print_times(T1,T2,T3,X,20).


/* compensation loop, used to measure the time spent in the loop  */
compens_loop(0).
compens_loop(X) :- Y is X - 1, compens_loop(Y).

/* loop to test choice point creation   */
cre_CP(0).
cre_CP(N):-M is N-1, ccp1(0,0,0), cre_CP(M).

/* loop to test deep backtracking       */
deep_back(0).
deep_back(X) :- pd(X1,X2,X3), Y is X - 1, deep_back(Y).

/* loop to test shallow backtracking */
shallow_back(0).
shallow_back(X) :- ps(X1,X2,X3), Y is X - 1, shallow_back(Y).


print_times(T1,T2,T3,X,I) :-        /* prints the results */
        TT1 is T2 - T1,
        TT2 is T3 - T2,
        TT is TT1 - TT2,
        write('T overall loop:   '),write(TT1), nl,
        write('T compens loop:   '),write(TT2), nl,
        write('T net:            '),write(TT),nl,
        write('KLips:            '),
        Li is I * X,
        Lips is Li / TT,
        KLips is Lips / 1000,
        write(KLips),nl,nl.

/*  ccp1 creates 20 choice points */
/*   ccp1 is the beginning of a set of predicates                         */
/*   composed of 2 clauses each. Every invokation of nd0 will create      */
/*   a sequence of 20 choice points. The body of the clauses are          */
/*   limited to one goal, thus avoiding a creation of environment         */
/*   when the clause is activated. nd0, and its successors, have          */
/*   three arguments to comply with our average static analysis           */
/*   results made on more than 30 real Prolog programs.                   */

ccp1(X,Y,Z):-ccp2(X,Y,Z).
ccp1(X,Y,Z).
ccp2(X,Y,Z):-ccp3(X,Y,Z).
ccp2(X,Y,Z).
ccp3(X,Y,Z):-ccp4(X,Y,Z).
ccp3(X,Y,Z).
ccp4(X,Y,Z):-ccp5(X,Y,Z).
ccp4(X,Y,Z).
ccp5(X,Y,Z):-ccp6(X,Y,Z).
ccp5(X,Y,Z).
ccp6(X,Y,Z):-ccp7(X,Y,Z).
ccp6(X,Y,Z).
ccp7(X,Y,Z):-ccp8(X,Y,Z).
ccp7(X,Y,Z).
ccp8(X,Y,Z):-ccp9(X,Y,Z).
ccp8(X,Y,Z).
ccp9(X,Y,Z):-ccp10(X,Y,Z).
ccp9(X,Y,Z).
ccp10(X,Y,Z):-ccp11(X,Y,Z).
ccp10(X,Y,Z).
ccp11(X,Y,Z):-ccp12(X,Y,Z).
ccp11(X,Y,Z).
ccp12(X,Y,Z):-ccp13(X,Y,Z).
ccp12(X,Y,Z).
ccp13(X,Y,Z):-ccp14(X,Y,Z).
ccp13(X,Y,Z).
ccp14(X,Y,Z):-ccp15(X,Y,Z).
ccp14(X,Y,Z).
ccp15(X,Y,Z):-ccp16(X,Y,Z).
ccp15(X,Y,Z).
ccp16(X,Y,Z):-ccp17(X,Y,Z).
ccp16(X,Y,Z).
ccp17(X,Y,Z):-ccp18(X,Y,Z).
ccp17(X,Y,Z).
ccp18(X,Y,Z):-ccp19(X,Y,Z).
ccp18(X,Y,Z).
ccp19(X,Y,Z):-ccp20(X,Y,Z).
ccp19(X,Y,Z).

ccp20(X,Y,Z).
ccp20(X,Y,Z).


/*  deep backtracking */
/*  The call to pd creates a choice point, and invokes a                */
/*  call to q. It will fail and there will be a backtracking step       */
/*  to try the next clause defining pd. pd has 21 clauses,thus failure  */
/*  occurs 20 times                                                     */

pd(X1,X2,X3) :- q(X1,X2,a).
pd(X1,X2,X3) :- q(X1,X2,a).
pd(X1,X2,X3) :- q(X1,X2,a).
pd(X1,X2,X3) :- q(X1,X2,a).
pd(X1,X2,X3) :- q(X1,X2,a).
pd(X1,X2,X3) :- q(X1,X2,a).
pd(X1,X2,X3) :- q(X1,X2,a).
pd(X1,X2,X3) :- q(X1,X2,a).
pd(X1,X2,X3) :- q(X1,X2,a).
pd(X1,X2,X3) :- q(X1,X2,a).
pd(X1,X2,X3) :- q(X1,X2,a).
pd(X1,X2,X3) :- q(X1,X2,a).
pd(X1,X2,X3) :- q(X1,X2,a).
pd(X1,X2,X3) :- q(X1,X2,a).
pd(X1,X2,X3) :- q(X1,X2,a).
pd(X1,X2,X3) :- q(X1,X2,a).
pd(X1,X2,X3) :- q(X1,X2,a).
pd(X1,X2,X3) :- q(X1,X2,a).
pd(X1,X2,X3) :- q(X1,X2,a).
pd(X1,X2,X3) :- q(X1,X2,a).
pd(X1,X2,X3).
q(X1,X2,b).


/*   shallow backtracking */
/*   The ps predicate fails 20 times. The shallow backtracking     */
/*   will not restore all current state registers in Prolog        */
/*   systems which perform this optimisation, while others will.   */

ps(X1,X2,X3) :- fail.
ps(X1,X2,X3) :- fail.
ps(X1,X2,X3) :- fail.
ps(X1,X2,X3) :- fail.
ps(X1,X2,X3) :- fail.
ps(X1,X2,X3) :- fail.
ps(X1,X2,X3) :- fail.
ps(X1,X2,X3) :- fail.
ps(X1,X2,X3) :- fail.
ps(X1,X2,X3) :- fail.
ps(X1,X2,X3) :- fail.
ps(X1,X2,X3) :- fail.
ps(X1,X2,X3) :- fail.
ps(X1,X2,X3) :- fail.
ps(X1,X2,X3) :- fail.
ps(X1,X2,X3) :- fail.
ps(X1,X2,X3) :- fail.
ps(X1,X2,X3) :- fail.
ps(X1,X2,X3) :- fail.
ps(X1,X2,X3) :- fail.
ps(X1,X2,X3).

---------------------cut here - end of program listing-----------------------