Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!csd4.milw.wisc.edu!lll-winken!uunet!munnari!murtoa.cs.mu.oz.au!munnari.oz!lee
From: lee@munnari.oz (Lee Naish)
Newsgroups: comp.lang.prolog
Subject: Re: Inline expansion versus threaded code
Keywords: paging threaded code
Message-ID: <1399@murtoa.cs.mu.oz.au>
Date: 14 Apr 89 07:24:15 GMT
References: <1635@kulcs.kulcs.uucp> <8489@russell.STANFORD.EDU>
Sender: news@cs.mu.oz.au
Reply-To: lee@munmurra.UUCP (Lee Naish)
Organization: University of Melbourne, Comp Sci Dept
Lines: 40

pereira@russell.UUCP (Fernando Pereira) writes:
>soon after your program starts paging
>you may as well forget about it

>compact encoding is essential in that it allows us to run an
>n times larger problem without paging (n varying from 4 to 10
>depending on the size ratio between native compiled code and threaded
>code)

When I was visiting SICS over the northern winter I ran the MTS natural
language system, written by Xiuming Huang, under Sicstus Prolog using
the bytecode emulator and the (new) native code system.  I have to agree
that paging really kills the system.  However, the code size factor was
not as great as Fernando suggests in this system.  The native code
version was between 14 and 15 Mb; the emulated system between 6 and 7 Mb
(if I recall correctly).  Sicstus emulated code is not particularly
compact (instructions are halfword or word aligned to speed up loading
of instructions and operands) but I think that is true for many Prolog
systems (eg, Quintus uses 16 bit opcodes, I've heard).

About half the code in MTS is complex dcg rules and about half is the
lexicon (large sets of facts).  I'm not sure how the size and speed
ratios compare with these two rather different types of code.  It may be
important for optimization (eg, it might be best to compile one with
native code and emulate the other).  Locality is another important
issue.  The working set of the lexicon code is related to the number of
different words in the sentence/text being processed.  This is probably
reasonably small, even though the fine grain locality is poor.  For the
grammar rules, fine grain locality is probably better, but the quantity
of code used overall is larger.

In the longer term, parallelism may have a significant role to play.
Having lots of memory attached to a single processor means the memory is
not used efficiently.  Shared memory multiprocessors make better use of
memory and parallelism can also absorb some memory latency due to paging
(while one bit of the computation is waiting for a page, another bit can
be done).  In other words, the processor utilisation is increased also.
You have to be careful to avoid thrashing of course.

	lee