Path: utzoo!mnetor!uunet!husc6!ut-sally!utah-cs!defun.utah.edu!shebs From: shebs%defun.utah.edu.uucp@utah-cs.UUCP (Stanley T. Shebs) Newsgroups: comp.lang.prolog Subject: Re: Strings Message-ID: <5348@utah-cs.UUCP> Date: 15 Mar 88 02:35:02 GMT References: <512@ecrcvax.UUCP> <768@cresswell.quintus.UUCP> Sender: news@utah-cs.UUCP Reply-To: shebs%defun.utah.edu.UUCP@utah-cs.UUCP (Stanley T. Shebs) Organization: PASS Research Group Lines: 56 In article <768@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes: >When I say that it is more efficient to use lists of character codes >rather than packed byte vectors, I am reporting empirical measurements. >I don't have to explain it: I'm just saying truthfully that in the >cases I have measured lists were faster. There may be Prolog systems >which implement lists so badly that this isn't true. I have to admit >that I've only really tested concatenation and searching, not accessing >single characters. This is a very interesting assertion, because it has a lot of implications for *any* vector/array-type representation. The broadest interpretation is that any and all vector-like types are bad. This would also extend to types that one might not think of as vector-like, such as infinite-precision integers (bignums). So, if Prolog strings are better as lists, is that also true for Prolog bignums? The reason I wonder is that Lisp has not used lists to represent bignums since the mid-60s, and (for instance) JonL White's bignum paper in the 86 Lisp conference assumes a vector-like representation. Off the top of my head, it seems that the Prolog evaluation model is the ultimate reason for the advantage of lists. If not, O'Keefe's empirical measurements have some consequences reaching far beyond contemporary Prolog, and Lispers should start rethinking their implementation techniques... Incidentally, David Wise has made similar claims for the advantages of list representation in functional languages, but (if I recall correctly) he also assumes binary trees and some sort of hardware memory support. > frontchar(String0, Char1, String1) >Let's assume a conventional Prolog, rather than Turbo Prolog, which no >doubt does something clever. String0 has to be dereferenced and type >tested (just like a list cell!) then the first character has to be >extracted and *boxed*, and a new string descriptor has to be created >and *boxed* and assigned to String1. Assuming that a boxed character >costs one cell and a string descriptor costs two cells (not unreasonable), >walking down a string of N characters using frontchar/3 ends up using >3N cells for the characters and string tails *in addition* to the >packed byte vector we started with, whereas the list version would have >used 2N cells total. Boxing characters is just as bad as boxing small integers, and just as unnecessary. Avoiding the creation of a new string descriptor is harder, and requires compiler analysis, but it is possible under many circumstances. I think the real underlying motivation for strings is that languages like C can do string processing with essentially zero space and time overhead, and one would like to have Prolog to have the same characteristics. It's not so much an issue of doing one operation on German titles, but of passing a million characters of text through a program without allocating so much as one byte of storage dynamically. If a Prolog implementor could guarantee that, then no one would care whether there was a separate string datatype or not, and the implementor could make a lot of money! stan shebs shebs@cs.utah.edu