Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!think.com!snorkelwacker.mit.edu!bloom-beacon!eru!hagbard!sunic!mcsun!unido!uklirb!shell From: acha@CS.CMU.EDU (Anurag Acharya) Newsgroups: comp.ai.shells Subject: Re: KES Timing Result Message-ID: <7700@uklirb.informatik.uni-kl.de> Date: 3 Apr 91 18:10:53 GMT References: <7642@uklirb.informatik.uni-kl.de> Sender: shell@uklirb.informatik.uni-kl.de Organization: Carnegie Mellon University Lines: 64 Approved: shell@dfki.uni-kl.de Posted-Date: Fri Apr 5 09:58:36 GMT 1991 In-reply-to: srt@aero.org's message of 22 Mar 91 17:07:27 GMT In article <7667@uklirb.informatik.uni-kl.de> srt@aero.org (Scott TCB Turner) writes: (Klaus ten Hagen) writes: >Unfortunately the problem is that such an ``benchmark'' not even gives >``a general feeling'', since the speed determining parts of an >rulebased system are not tested by such a crude trial. Nonsense. Repeated firing of a single rule tests simple conditions, rule activation, and the internal representation of rules and data (to the extent that compiled representations will be faster than interpreted ones). Simple tests are simple; that doesn't mean they're worthless. Simple tests like repeated firing of a single trivial rule provide little or no information that might help predict the performance of realistic programs. Furthermore, "simple" rule of one production system language may not all that "simple" in another. Such undisciplined benchmarking attempts yield data of zilch utility. example1: take languages that do not provide pattern matching capabilities: a typical repeatedly firing production in such languages might be int foo = 10000; (p (foo > 0) --> (replace foo by (foo - 1))) this is nothing more than a syntactically sugared version of the following C while loop while (foo) foo--; compare this with : example 2: (p (foo ^value > 0) --> (modify 1 ^value (compute - 1))) which needs to exercise far more complex capabilities. therefore the comparison is grossly in unfair to languages that are more expressive in terms of the conditions that they can match. i have two big gripes with the benchmarking results based on repeated firing of a single rule that are posted in this group from time to time. 1. productions used in these benchmarks do not contain multiple conditions and therefore do not perform consistency checks across conditions. most useful productions need this capability. very few productions are as widely applicable as to make do with only one condition. 2. benchmarks that compare languages with different expressive power must exercise equal capabilities if they are to be fair or useful. otherwise, the data gathered is not worth the cpuseconds spent gathering it. anurag