Path: utzoo!mnetor!uunet!husc6!ut-sally!utah-cs!defun.utah.edu!shebs
From: shebs%defun.utah.edu.uucp@utah-cs.UUCP (Stanley T. Shebs)
Newsgroups: comp.lang.prolog
Subject: Re: Strings
Message-ID: <5348@utah-cs.UUCP>
Date: 15 Mar 88 02:35:02 GMT
References: <512@ecrcvax.UUCP> <768@cresswell.quintus.UUCP>
Sender: news@utah-cs.UUCP
Reply-To: shebs%defun.utah.edu.UUCP@utah-cs.UUCP (Stanley T. Shebs)
Organization: PASS Research Group
Lines: 56

In article <768@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes:

>When I say that it is more efficient to use lists of character codes
>rather than packed byte vectors, I am reporting empirical measurements.
>I don't have to explain it:  I'm just saying truthfully that in the
>cases I have measured lists were faster.  There may be Prolog systems
>which implement lists so badly that this isn't true.  I have to admit
>that I've only really tested concatenation and searching, not accessing
>single characters.

This is a very interesting assertion, because it has a lot of implications
for *any* vector/array-type representation.  The broadest interpretation
is that any and all vector-like types are bad.  This would also extend to
types that one might not think of as vector-like, such as infinite-precision
integers (bignums).  So, if Prolog strings are better as lists, is that also
true for Prolog bignums?

The reason I wonder is that Lisp has not used lists to represent bignums
since the mid-60s, and (for instance) JonL White's bignum paper in the
86 Lisp conference assumes a vector-like representation.  Off the top of
my head, it seems that the Prolog evaluation model is the ultimate reason
for the advantage of lists.  If not, O'Keefe's empirical measurements have
some consequences reaching far beyond contemporary Prolog, and Lispers
should start rethinking their implementation techniques...

Incidentally, David Wise has made similar claims for the advantages of
list representation in functional languages, but (if I recall correctly)
he also assumes binary trees and some sort of hardware memory support.

>	frontchar(String0, Char1, String1)
>Let's assume a conventional Prolog, rather than Turbo Prolog, which no
>doubt does something clever.  String0 has to be dereferenced and type
>tested (just like a list cell!) then the first character has to be
>extracted and *boxed*, and a new string descriptor has to be created
>and *boxed* and assigned to String1.  Assuming that a boxed character
>costs one cell and a string descriptor costs two cells (not unreasonable),
>walking down a string of N characters using frontchar/3 ends up using
>3N cells for the characters and string tails *in addition* to the
>packed byte vector we started with, whereas the list version would have
>used 2N cells total.

Boxing characters is just as bad as boxing small integers, and just as
unnecessary.  Avoiding the creation of a new string descriptor is harder,
and requires compiler analysis, but it is possible under many circumstances.

I think the real underlying motivation for strings is that languages like C
can do string processing with essentially zero space and time overhead,
and one would like to have Prolog to have the same characteristics.
It's not so much an issue of doing one operation on German titles, but of
passing a million characters of text through a program without allocating
so much as one byte of storage dynamically.  If a Prolog implementor could
guarantee that, then no one would care whether there was a separate string
datatype or not, and the implementor could make a lot of money!

							stan shebs
							shebs@cs.utah.edu