Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.2 (Tek) 9/28/84 based on 9/17/84; site tekchips.UUCP
Path: utzoo!linus!decvax!tektronix!tekcrl!tekchips!stevev
From: stevev@tekchips.UUCP (Steve Vegdahl)
Newsgroups: net.lang
Subject: Re: Complex/Simple scanners - More hashing!
Message-ID: <308@tekchips.UUCP>
Date: Mon, 21-Oct-85 15:18:19 EDT
Article-I.D.: tekchips.308
Posted: Mon Oct 21 15:18:19 1985
Date-Received: Thu, 24-Oct-85 06:52:21 EDT
References: <2394@sjuvax.UUCP>
Distribution: net
Organization: Tektronix, Beaverton OR
Lines: 29

> 	Do you think it would be faster for a compiler to have a more
> 	complex scanner that could recognize key-words/reserved identifiers
> 	immediately (i.e., simply by keeping a record of each inputted 
> 	character) and not have to hash and search for those key-words
> 	each time one is encountered, or would it be better to keep the 
> 	scanner simple, recognizing each key-word as simply an identifier,
> 	and then search the symbol table to determined wether or not
> 	your identifier was reserved.
> 
> 	It is not immediately obvious to me which one would be quicker,
> 	although I have a tendency to think it may be the former.  
> 	Lex allows for one to design a scanner of the first type 
> 	very simply.

If you ignore things like virtual memory, I'd think that the complex scanner
would be faster if implemented in a straightforward way.  However, this
would increase the size of the state-transition table.  Instead of having
one or two character classes for letters, you'd have something like 26.

I prefer to use hashing here because the set of reserved words is known
at compiler-compile time.  Thus, you can analyze the reserved words in
advance and select a *perfect hash function* (or something close to it).
The size of your hash table will be small, and you will be guaranteed
to find any reserved word in a single probe of the table.

		Steve Vegdahl
		Computer Research Lab.
		Tektronix, Inc.
		Beaverton, Oregon