Path: utzoo!utgpu!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!rutgers!njin!princeton!phoenix!levy From: levy@phoenix.Princeton.EDU (Silvio Levy) Newsgroups: comp.sys.next Subject: NeXT's Digital Library Summary: NeXT's Digital Library is a great disappointment. Message-ID: <5037@phoenix.Princeton.EDU> Date: 27 Dec 88 17:23:19 GMT References: <19728@ames.arc.nasa.gov> Reply-To: levy@Princeton.EDU (Silvio Levy) Organization: Princeton University, NJ Lines: 97 In article <19728@ames.arc.nasa.gov> mike@ames.arc.nasa.gov.UUCP (Mike Smithwick) writes: >The Digital Librarian is impressive. We searched for the word "celestial" >throughout the works of Shakespeare. It found all 3 entries in what appeared >to be less than 5 seconds. Huh? I'm mystified. What do you mean by ``all 3 entries''? Using the UNIX utility `grep' I found 17. In general I'm very disappointed with the Digital Library, for many reasons detailed below. Notice that while some of the reasons appear to be bugs, hence have a chance to go away, others seem to constitute ``features'', so probably will stay. I start with the ``features''. I'm thankful to Nick Katz (nmk@fine.princeton.edu) for pointing out some of the facts below and motivating me to make a more complete study. UNDESIRABLE FEATURE #1: You can only search for words, not for strings or phrases. This means if to find out where S. wrote ``To be or not to be'', you'd have to wade through thousands of occurrences of ``to'', ``be'', ``or'' or ``not''. But read on. UNDESIRABLE FEATURE #2: Apparently very common words cannot be used as search keys at all-- you get a ``0 found'' response. This is the case with the four words mentioned above. Together with feature #1, this means that the Digital Librarian simply won't locate S.'s most famous quotation. UNDESIRABLE FEATURE #3: The display of occurrences is done in two windows. The top window, a smaller one, consists of one line for each file where the word was found (each file has a scene of a play, or a sonnet, etc.) The line contains the file name (e.g. Coriolanus: 1.4) and the beginning of the file. The latter is completely useless information, as it usually consists of stage directions, etc. I would expect here a context line instead, including the keyword. To actually see the quotations you want, you select a line from the top window; the bottom window shows the corresponding file, centered around the first occurrence of the word in the file. The upshot is that to find a particular quotation, you have to click on every line of the first window to open the corresponding file, then click on ``Find'' before leaving that line (just in case the file contains more than one occurrence). Compare this with the system used in printed and on-line concordances, where you're presented with a list of context lines and can scan it visually for the quotation you're looking for. UNDESIRABLE FEATURE #4: The source text has very low-level formatting commands embedded in it. (Though I guess I should be thankful it's in ASCII files, not in binary files in some proprietary format...) For example, the beginning of /Plays/Hamlet/1.1 is something like this: ... {\pard\f0\fs28{\fs48 Hamlet\ }\ \ {\b\fs36 1.1} \ {\i Enter Barnardo [...] }{\b \fs24 BARNARDO} Who's there?\ ... For this text to be used elsewhere than in an ``edit'' file, or even within an ``edit'' file but in a different format, you have to strip all this garbage. The markup should instead be done at a higher level, so global changes are easy to make. For example, using a TeX-like notation (that's what I'm most accustomed with; but SGML or any other markup language would be do equally well): \title Hamlet \endtitle \scene 1.1 \endscene \dir Enter Barnardo [...] \enddir \speak BARNARDO \endspeak Who's there? Now for the bugs: BUG #1: Not all occurrences of a word are found -- far from it. And generally you have no clue of that. I've already mentioned the ``celestial'' fiasco (3 found in 17). If you try ``horse'' the situation is similar: 18 found out of 369. Actually this is what started this whole thing: Nick Katz pointed out that if you search for ``horse'' you get (among others) a line saying ``And our twelve thousand horse'' (Ant. and Cl.: 3.7), but if you search for ``twelve'' you don't get this same line! The most annoying thing is that the choice of quotations presented doesn't seem to be based on any clear criterion: the 14 ``celestial''s that didn't make into the search seemed to have as much of a right to be there as the three that did! BUG #2: Treatment of plurals, etc. is inconsistent. E.g. searching for ``horse'' and ``horses'' brings up two disjoint sets of occurrences, but one of the occurrences listed under ``horse'' actually says ``horses'' (1 Henry IV: 2.4). In general there seems to be any way to search for words under a prefix (as there seems to be for Webster's, although that doesn't work all the time either -- but that's the subject of another message). Silvio Levy (levy@princeton.edu)