Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!think.com!snorkelwacker.mit.edu!bloom-beacon!eru!hagbard!sunic!mcsun!unido!uklirb!shell
From: acha@CS.CMU.EDU (Anurag Acharya)
Newsgroups: comp.ai.shells
Subject: Re: KES Timing Result
Message-ID: <7700@uklirb.informatik.uni-kl.de>
Date: 3 Apr 91 18:10:53 GMT
References: <7642@uklirb.informatik.uni-kl.de>
Sender: shell@uklirb.informatik.uni-kl.de
Organization: Carnegie Mellon University
Lines: 64
Approved: shell@dfki.uni-kl.de
Posted-Date: Fri Apr  5 09:58:36 GMT 1991
In-reply-to: srt@aero.org's message of 22 Mar 91 17:07:27 GMT

In article <7667@uklirb.informatik.uni-kl.de> srt@aero.org (Scott TCB Turner) writes:
   (Klaus ten Hagen) writes:
   >Unfortunately the problem is that such an ``benchmark'' not even gives
   >``a general feeling'', since the speed determining parts of an
   >rulebased system are not tested by such a crude trial.

   Nonsense.  Repeated firing of a single rule tests simple conditions,
   rule activation, and the internal representation of rules and data (to
   the extent that compiled representations will be faster than
   interpreted ones).  Simple tests are simple; that doesn't mean they're
   worthless.

Simple tests like repeated firing of a single trivial rule provide little
or no information that might help predict the performance of realistic
programs. Furthermore, "simple" rule of one production system language
may not all that "simple" in another. Such undisciplined benchmarking 
attempts yield data of zilch utility.


example1:

take languages that do not provide pattern matching capabilities:

a typical repeatedly firing production in such languages might be

int foo = 10000;

(p 
  (foo > 0)
  -->
  (replace foo by (foo - 1)))

this is nothing more than a syntactically sugared version of the following
 C while loop

while (foo) foo--;

compare this with :

example 2:

(p 
  (foo ^value <v> > 0)
  -->
  (modify 1 ^value (compute <v> - 1)))

which needs to exercise far more complex capabilities. therefore the 
comparison is grossly in unfair to languages that are more
expressive in terms of the conditions that they can match.

i have two big gripes with the benchmarking results based on repeated firing 
of a single rule that are posted in this group from time to time.

1. productions used in these benchmarks do not contain multiple conditions
   and therefore do not perform consistency checks across conditions.
   most useful productions need this capability. very few productions
   are as widely applicable as to make do with only one condition.

2. benchmarks that compare languages with different expressive power must 
   exercise equal capabilities if they are to be fair or useful.
   otherwise, the data gathered is not worth the cpuseconds spent gathering it.

anurag