Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.2 9/18/84; site topaz.ARPA
Path: utzoo!watmath!clyde!cbosgd!cbdkc1!desoto!packard!topaz!@RUTGERS.ARPA:coraki!pratt@Navajo
From: pratt%Navajo@.ARPA
Newsgroups: net.works
Subject: PDP-8 Story
Message-ID: <1864@topaz.ARPA>
Date: Thu, 2-May-85 00:27:42 EDT
Article-I.D.: topaz.1864
Posted: Thu May  2 00:27:42 1985
Date-Received: Fri, 3-May-85 03:04:08 EDT
Sender: daemon@topaz.ARPA
Organization: Rutgers Univ., New Brunswick, N.J.
Lines: 76

From: coraki!pratt@Navajo (Vaughan Pratt)

A hearty hear-hear for Peter Barada's praise of the PDP-8.  Here's a
PDP-8 story I never got around to telling, but which in retrospect
was a fun project.

At Sydney University in 1969 I wanted a workstation (by concept, not by
name, the word "workstation" didn't appear till the 1980's) on which to
do my Master's thesis project.  The only candidate was a 4K PDP-8
purchased ostensibly as a $10K cog in a $50K wheel consisting of a 338
vector display system.  Simultaneously a psychology Ph.D. student and I
independently realized that it would be perfect for our respective
projects, his to run experiments monitoring interference between human
video and audio channels, mine to write an interactive solver of
syllogisms of the kind Lewis Carroll used to publish in newspapers.  We
agreed that he would have it for the mornings and I for the
afternoons.

When it became clear that my project would not fit, I reduced its scope
to mere translation of English syllogisms into conjunctive normal form,
ready for input into a ground-resolution theorem prover (which I would
have written to turn the project into a Ph.D. thesis had I stayed at
Sydney).  The final program ran entirely within the 4K PDP8 (no
overlays), using a Model 33 10-cps teletype as its only peripheral.
Code and data storage was split 50/50.  Here's the breakdown of memory,
with rough sizes.

Modules:
	Operating System
		Bootstrap Loader (20)
		Load-and-go Assembler (homebrew) (160)
		Debugger (ODT) (400?)
		Interrupt handler (for parallel reading and printing) (90)
	Datatype runtimes
		String package (100)
		List package (cons/car/cdr + "manual" GC) (128)
		Sparse matrix package (for Younger's algorithm) (200)
	Natural language
		Lexical analysis (80)
		Parser (Younger's algorithm) (200)
		Syntax-directed translator (450)
	Logic
		AND-OR-NOT for CNF expressions (160)
Databases:
	Dictionary: ~150 "closed-class" words (prep., pron.,conj.,adv.) (500)
	Affixes: ~20 prefixes and suffixes
	Grammar: 156 productions (300)
	Semantics: one expression per production, in RPN (250)
Work Areas:
	I/O and string buffers: 256
	Younger matrix: 500
	Free list space: 200

n-word sentences were parsed at .007n**2 seconds, always.  (The n**3
bound for Younger's algorithm is worst case, expect n**2 for typical
grammars of English.)  For each parse the output consisted of a list of
phrases from the sentence each labelled with a letter, and a
conjunctive normal form formula using those letters.  Its biggest
problem was that it would sometimes find half a dozen parses of a
sentence, not all yielding the same output, a problem I did not deal
with.  For about 3/4 of the 200 sentences in Carroll's collection of
syllogisms, at least one of the parses would yield the correct output.
The main problem here was lack of time for tuning the grammar.  The
program did much better than I expected in figuring out the parts of
speech of open class words (nouns, verbs, adjectives) it had not seen
before.

Near the end of the project I found myself squeezing in additional
instructions by patching in jumps to pages where there were still a few
empty words.  No way was this program ever going to get additional
functionality without getting another 4K!

That reminds me, I need to get another megabyte for my Sun, 2Mb is too
small for big Franz packages.

-v