Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!watmath!clyde!burl!ulysses!mhuxr!mhuxn!ihnp4!qantel!lll-lcc!lll-crg!seismo!mcvax!unido!ecrcvax!jclaude From: jclaude@ecrcvax.UUCP (Jean Claude Syre) Newsgroups: net.lang.prolog Subject: Benchmarking Prolog Systems (Version 1) Message-ID: <226@ecrcvax.UUCP> Date: Mon, 28-Apr-86 16:18:59 EDT Article-I.D.: ecrcvax.226 Posted: Mon Apr 28 16:18:59 1986 Date-Received: Fri, 2-May-86 09:46:34 EDT Organization: ECRC, D-8000 Muenchen 81, W. Germany Lines: 585 A Proposal for *********************************************** *** BENCHMARK PROGRAMS FOR PROLOG SYSTEMS *** *********************************************** Part 1 (of 3) J.C. SYRE ECRC (European Computer-industry Research Center) Arabellastr. 17 D-8000 MUNICH 81 WEST GERMANY This set of benchmark programs is a collective work done by the Logic Programming Group and the Computer Architecture Group of ECRC, the European Computer-Industry Research Centre, in Munich. For convenience, you can send messages to Hans Benker (replace "jclaude" by "hans" in this net address), or myself. The designers of the benchmark programs are: H. Benker, J. Noye, S. Schmitz, J.C. Syre, and ****many others**** from ECRC. The first section deals with simple programs whose single purpose is to evaluate a single feature of prolog execution. The times we give correspond to an interpretation using Cprolog on a VAX11/785 under UNIX BSD4.2, in a "quiet" environment (which may be obtained on a sunny Sunday with all other logins prohibited). They are subject to a 10 percent inaccuracy due to paging and timecounts. The second section presents more complex programs, which may still run on small Prolog systems. We are open to suggestions to improve the significance of the benchmark. The programs include many of the programs taken by the University of Berkeley for their evaluation of the PLM1 prolog machine. We are looking for more programs dealing with other domains where Prolog may be appropriate (natural language, data bases, reasoning, etc.). Proposals can be sent to me, but do not forget that the programs must be kept short at exec time. There should be a third section, that we have not fully built yet, and we count on you to build it. Those programs should be representative of real scale prototypes of large Prolog applications. CHAT80 is an example of such programs for this section. If you feel embarassed to propagate one of your programs having properties that you are reluctant to make public, you may think of modifying it or truncate it so that it becomes hardly readable, or useless for actual use by others. So, here is the first version. We do not pretend at any originality, we expect to have a large feedback from you, either to improve this version (through comments, new proposals or new programs), or to have your evaluation results on your system. I will report on the next versions and your feelings in the near future. Have fun ! 1. Simple Benchmark programs. These simple (or simplistic) programs aim at evaluating a single feature of the Prolog System. Here a Prolog System is understood as either a pair , where the Prolog software is an interpreter, a compiler, a combination of both, and the Host machine is a conventional machine (with its operating system and workload), a simulator of a prolog processor, or a real piece of Prolog hardware (direct interpreter, or PLM processor, or anything else). The "single feature" mentioned above means that the performance results will show how well the Prolog System can handle a particular characteristic of the language. The phenomena we measure are: o calls o non-determinism o handling of environments o unification o indexing There are many more which would be interesting to measure, e.g. efficiency of built-ins, "assert" and "retract", I/O and tail recursion optimisation. However for now the above 5 criteria seemed to be the most interesting and maybe somebody on the net can design benchmarks for the remaining features. Measuring a single feature of a language is difficult. One single execution of a tiny program testing a particular feature takes not enough time to measure it precisely. To get a better precision one has to execute the test program hundreds of times. There are two ways to do this: Write down the test program as often as one wants to execute it or include it in a loop. The first solution implies that one has to write programs with hundreds of lines of code, where each line does the same job. This is not convenient and it is desirable to use loops. In the case of our benchmark programs however, the time spent executing the loop is not negligible, due to the very small size of our test programs. Therefore we used a combination of both methods, i.e. sequences of repeated code surrounded with a loop. In order to minimise the effect of the loop, we actually run as well an "empty" loop, without the benchmark program. We call this "compensation loop" and subtract its execution time from the execution time of the loop including the benchmark program. This increases of course the relative error on the time measurement, but we have decreased the influence of the unavoidable loop. The repeated code can be generated by your favorite editor. How much repeated code you need to get a sufficient precision of course depends on the implementation of your particular Prolog system. However we put as much repeated code into each benchmark program as we think is apropriate to most Prolog implementations. So we think you should get sufficient precision without modifying our programs in that respect. The listings of the programs follow below. For each simple program we try to give the characteristics of it and some remarks about what it measures. Note that "cputime" in C-Prolog on the VAX gives you the possibility to measure runtime. This may be different in other Prolog systems. All the rest of the programs should be portable to any other system without any problem. 1.1. Program to test calls (boresea). This is the one you always dreamed to test! Like all benchmarks, it uses a loop calling the actual benchmark program. The benchmark program consists of a sequence of 200 predicates having no arguments, no choice points, NOTHING. 200 is chosen to have sufficient accuracy in measuring the execution time. The results show the effect of pure calls, and the Klips performance can be called the peak performance of the prolog system. Note that the peak performance has very little significance to classify the overall performance of a Prolog system. ---------------- cut here - beginning of program listing --------------------- /* This program is called with the query "?-boresea(X)." */ /* X is the number of loop iterations executed. It should be big */ /* enough to give significant results. */ /* suggested value for X: 100 for interpreted code*/ /* 1000 for compiled code */ /* average values for C-prolog interpreter: */ /* X=1000, Tloop=27.1 T.comp=1.0 Tnet=26.1 Klips=7.7 */ boresea(X) :- T1 is cputime, do_max_KLips(X), /* calls the loop to execute the */ T2 is cputime, /* sequence of 200 predicates */ compens_loop(X), /* compensation loop */ T3 is cputime, print_times(T1,T2,T3,X,200). /* compute and print results */ compens_loop(0). /* compensation loop */ compens_loop(X) :- Y is X - 1, compens_loop(Y). print_times(T1,T2,T3,X,I) :- /* prints the results */ TT1 is T2 - T1, TT2 is T3 - T2, TT is TT1 - TT2, write('T overall loop: '),write(TT1), nl, write('T compens loop: '),write(TT2), nl, write('T net: '),write(TT),nl, write('KLips: '), Li is I * X, Lips is Li / TT, KLips is Lips / 1000, write(KLips),nl,nl. do_max_KLips(0). /* loop calling the actual benchmark */ do_max_KLips(X) :- lips1, Y is X - 1, do_max_KLips(Y). /* predicates to test call */ lips1 :- lips2. lips2 :- lips3. lips3 :- lips4. lips4 :- lips5. lips5 :- lips6. lips6 :- lips7. lips7 :- lips8. lips8 :- lips9. lips9 :- lips10. lips10 :- lips11. lips11 :- lips12. lips12 :- lips13. lips13 :- lips14. lips14 :- lips15. lips15 :- lips16. lips16 :- lips17. lips17 :- lips18. lips18 :- lips19. lips19 :- lips20. lips20 :- lips21. lips21 :- lips22. lips22 :- lips23. lips23 :- lips24. lips24 :- lips25. lips25 :- lips26. lips26 :- lips27. lips27 :- lips28. lips28 :- lips29. lips29 :- lips30. lips30 :- lips31. lips31 :- lips32. lips32 :- lips33. lips33 :- lips34. lips34 :- lips35. lips35 :- lips36. lips36 :- lips37. lips37 :- lips38. lips38 :- lips39. lips39 :- lips40. lips40 :- lips41. lips41 :- lips42. lips42 :- lips43. lips43 :- lips44. lips44 :- lips45. lips45 :- lips46. lips46 :- lips47. lips47 :- lips48. lips48 :- lips49. lips49 :- lips50. lips50 :- lips51. lips51 :- lips52. lips52 :- lips53. lips53 :- lips54. lips54 :- lips55. lips55 :- lips56. lips56 :- lips57. lips57 :- lips58. lips58 :- lips59. lips59 :- lips60. lips60 :- lips61. lips61 :- lips62. lips62 :- lips63. lips63 :- lips64. lips64 :- lips65. lips65 :- lips66. lips66 :- lips67. lips67 :- lips68. lips68 :- lips69. lips69 :- lips70. lips70 :- lips71. lips71 :- lips72. lips72 :- lips73. lips73 :- lips74. lips74 :- lips75. lips75 :- lips76. lips76 :- lips77. lips77 :- lips78. lips78 :- lips79. lips79 :- lips80. lips80 :- lips81. lips81 :- lips82. lips82 :- lips83. lips83 :- lips84. lips84 :- lips85. lips85 :- lips86. lips86 :- lips87. lips87 :- lips88. lips88 :- lips89. lips89 :- lips90. lips90 :- lips91. lips91 :- lips92. lips92 :- lips93. lips93 :- lips94. lips94 :- lips95. lips95 :- lips96. lips96 :- lips97. lips97 :- lips98. lips98 :- lips99. lips99 :- lips100. lips100:- lips101. lips101 :- lips102. lips102 :- lips103. lips103 :- lips104. lips104 :- lips105. lips105 :- lips106. lips106 :- lips107. lips107 :- lips108. lips108 :- lips109. lips109 :- lips110. lips110 :- lips111. lips111 :- lips112. lips112 :- lips113. lips113 :- lips114. lips114 :- lips115. lips115 :- lips116. lips116 :- lips117. lips117 :- lips118. lips118 :- lips119. lips119 :- lips120. lips120 :- lips121. lips121 :- lips122. lips122 :- lips123. lips123 :- lips124. lips124 :- lips125. lips125 :- lips126. lips126 :- lips127. lips127 :- lips128. lips128 :- lips129. lips129 :- lips130. lips130 :- lips131. lips131 :- lips132. lips132 :- lips133. lips133 :- lips134. lips134 :- lips135. lips135 :- lips136. lips136 :- lips137. lips137 :- lips138. lips138 :- lips139. lips139 :- lips140. lips140 :- lips141. lips141 :- lips142. lips142 :- lips143. lips143 :- lips144. lips144 :- lips145. lips145 :- lips146. lips146 :- lips147. lips147 :- lips148. lips148 :- lips149. lips149 :- lips150. lips150 :- lips151. lips151 :- lips152. lips152 :- lips153. lips153 :- lips154. lips154 :- lips155. lips155 :- lips156. lips156 :- lips157. lips157 :- lips158. lips158 :- lips159. lips159 :- lips160. lips160 :- lips161. lips161 :- lips162. lips162 :- lips163. lips163 :- lips164. lips164 :- lips165. lips165 :- lips166. lips166 :- lips167. lips167 :- lips168. lips168 :- lips169. lips169 :- lips170. lips170 :- lips171. lips171 :- lips172. lips172 :- lips173. lips173 :- lips174. lips174 :- lips175. lips175 :- lips176. lips176 :- lips177. lips177 :- lips178. lips178 :- lips179. lips179 :- lips180. lips180 :- lips181. lips181 :- lips182. lips182 :- lips183. lips183 :- lips184. lips184 :- lips185. lips185 :- lips186. lips186 :- lips187. lips187 :- lips188. lips188 :- lips189. lips189 :- lips190. lips190 :- lips191. lips191 :- lips192. lips192 :- lips193. lips193 :- lips194. lips194 :- lips195. lips195 :- lips196. lips196 :- lips197. lips197 :- lips198. lips198 :- lips199. lips199 :- lips200. lips200. --------------------cut here - end of program listing------------------------- 1.2. Program to test non deterministic behaviour This program contains a series of 3 different benchmark predicates. The predicate "choice_point(N)" tests calls invoking the creation of a choice point, i.e. a branch point where the execution will possibly come back to in case of backtracking. It does NOT backtrack. We then present two predicates to evaluate the mechanism of backtracking during execution. Both predicates create one choice_point and then backtrack 20 times on every loop iteration step. "baktrak1(N)" exhibits a kind of backtracking called "deep", while "baktrak2(N)" deals with "shallow" backtracking. Both are worth being tried, whatever your particular Prolog System is. ----------------------cut here - beginning of program listing---------------- /* program to benchmark non deterministic behaviour of Prolog Systems */ /* The predicates are called: */ /* o "choice_point(N)" - creation of choice points */ /* o "baktrak1(N)" - deep backtracking */ /* o "baktrak2(N)" - shallow backtracking */ /* N is the number of loop iterations executed */ /* predicate to test creation of choice points without backtracking */ /* suggested value for N: 1000 */ /* results for Cprolog N=1000 */ /* Tloop=5.95 Tcompens=0.98 Tnet=4.97 Klips=4.02 */ choice_point(N):-T1 is cputime, cre_CP(N), T2 is cputime, compens_loop(N), T3 is cputime, print_times(T1,T2,T3,N,20). /* Predicate to test the (deep) backtracking mechanism. */ /* suggested value for N: 1000 (interp), 2000(comp) */ /* results for Cprolog: N=1000 */ /* Tloop=9.63 Tcomp=1 Tnet=8.63 Klips=2.32 */ baktrak1(N) :- T1 is cputime, deep_back(N), T2 is cputime, compens_loop(N), T3 is cputime, print_times(T1,T2,T3,N,20). /* Predicate to test the (shallow) backtracking mechanism */ /* suggested value for N: 1000 (interp), 2000 (comp) */ /* results for Cprolog: N=1000 */ /* Tloop=6.63 Tcomp=0.97 Tnet=5.67 Klips=3.53 */ baktrak2(X) :- T1 is cputime, shallow_back(X), T2 is cputime, compens_loop(X), T3 is cputime, print_times(T1,T2,T3,X,20). /* compensation loop, used to measure the time spent in the loop */ compens_loop(0). compens_loop(X) :- Y is X - 1, compens_loop(Y). /* loop to test choice point creation */ cre_CP(0). cre_CP(N):-M is N-1, ccp1(0,0,0), cre_CP(M). /* loop to test deep backtracking */ deep_back(0). deep_back(X) :- pd(X1,X2,X3), Y is X - 1, deep_back(Y). /* loop to test shallow backtracking */ shallow_back(0). shallow_back(X) :- ps(X1,X2,X3), Y is X - 1, shallow_back(Y). print_times(T1,T2,T3,X,I) :- /* prints the results */ TT1 is T2 - T1, TT2 is T3 - T2, TT is TT1 - TT2, write('T overall loop: '),write(TT1), nl, write('T compens loop: '),write(TT2), nl, write('T net: '),write(TT),nl, write('KLips: '), Li is I * X, Lips is Li / TT, KLips is Lips / 1000, write(KLips),nl,nl. /* ccp1 creates 20 choice points */ /* ccp1 is the beginning of a set of predicates */ /* composed of 2 clauses each. Every invokation of nd0 will create */ /* a sequence of 20 choice points. The body of the clauses are */ /* limited to one goal, thus avoiding a creation of environment */ /* when the clause is activated. nd0, and its successors, have */ /* three arguments to comply with our average static analysis */ /* results made on more than 30 real Prolog programs. */ ccp1(X,Y,Z):-ccp2(X,Y,Z). ccp1(X,Y,Z). ccp2(X,Y,Z):-ccp3(X,Y,Z). ccp2(X,Y,Z). ccp3(X,Y,Z):-ccp4(X,Y,Z). ccp3(X,Y,Z). ccp4(X,Y,Z):-ccp5(X,Y,Z). ccp4(X,Y,Z). ccp5(X,Y,Z):-ccp6(X,Y,Z). ccp5(X,Y,Z). ccp6(X,Y,Z):-ccp7(X,Y,Z). ccp6(X,Y,Z). ccp7(X,Y,Z):-ccp8(X,Y,Z). ccp7(X,Y,Z). ccp8(X,Y,Z):-ccp9(X,Y,Z). ccp8(X,Y,Z). ccp9(X,Y,Z):-ccp10(X,Y,Z). ccp9(X,Y,Z). ccp10(X,Y,Z):-ccp11(X,Y,Z). ccp10(X,Y,Z). ccp11(X,Y,Z):-ccp12(X,Y,Z). ccp11(X,Y,Z). ccp12(X,Y,Z):-ccp13(X,Y,Z). ccp12(X,Y,Z). ccp13(X,Y,Z):-ccp14(X,Y,Z). ccp13(X,Y,Z). ccp14(X,Y,Z):-ccp15(X,Y,Z). ccp14(X,Y,Z). ccp15(X,Y,Z):-ccp16(X,Y,Z). ccp15(X,Y,Z). ccp16(X,Y,Z):-ccp17(X,Y,Z). ccp16(X,Y,Z). ccp17(X,Y,Z):-ccp18(X,Y,Z). ccp17(X,Y,Z). ccp18(X,Y,Z):-ccp19(X,Y,Z). ccp18(X,Y,Z). ccp19(X,Y,Z):-ccp20(X,Y,Z). ccp19(X,Y,Z). ccp20(X,Y,Z). ccp20(X,Y,Z). /* deep backtracking */ /* The call to pd creates a choice point, and invokes a */ /* call to q. It will fail and there will be a backtracking step */ /* to try the next clause defining pd. pd has 21 clauses,thus failure */ /* occurs 20 times */ pd(X1,X2,X3) :- q(X1,X2,a). pd(X1,X2,X3) :- q(X1,X2,a). pd(X1,X2,X3) :- q(X1,X2,a). pd(X1,X2,X3) :- q(X1,X2,a). pd(X1,X2,X3) :- q(X1,X2,a). pd(X1,X2,X3) :- q(X1,X2,a). pd(X1,X2,X3) :- q(X1,X2,a). pd(X1,X2,X3) :- q(X1,X2,a). pd(X1,X2,X3) :- q(X1,X2,a). pd(X1,X2,X3) :- q(X1,X2,a). pd(X1,X2,X3) :- q(X1,X2,a). pd(X1,X2,X3) :- q(X1,X2,a). pd(X1,X2,X3) :- q(X1,X2,a). pd(X1,X2,X3) :- q(X1,X2,a). pd(X1,X2,X3) :- q(X1,X2,a). pd(X1,X2,X3) :- q(X1,X2,a). pd(X1,X2,X3) :- q(X1,X2,a). pd(X1,X2,X3) :- q(X1,X2,a). pd(X1,X2,X3) :- q(X1,X2,a). pd(X1,X2,X3) :- q(X1,X2,a). pd(X1,X2,X3). q(X1,X2,b). /* shallow backtracking */ /* The ps predicate fails 20 times. The shallow backtracking */ /* will not restore all current state registers in Prolog */ /* systems which perform this optimisation, while others will. */ ps(X1,X2,X3) :- fail. ps(X1,X2,X3) :- fail. ps(X1,X2,X3) :- fail. ps(X1,X2,X3) :- fail. ps(X1,X2,X3) :- fail. ps(X1,X2,X3) :- fail. ps(X1,X2,X3) :- fail. ps(X1,X2,X3) :- fail. ps(X1,X2,X3) :- fail. ps(X1,X2,X3) :- fail. ps(X1,X2,X3) :- fail. ps(X1,X2,X3) :- fail. ps(X1,X2,X3) :- fail. ps(X1,X2,X3) :- fail. ps(X1,X2,X3) :- fail. ps(X1,X2,X3) :- fail. ps(X1,X2,X3) :- fail. ps(X1,X2,X3) :- fail. ps(X1,X2,X3) :- fail. ps(X1,X2,X3) :- fail. ps(X1,X2,X3). ---------------------cut here - end of program listing-----------------------