Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 (Tek) 9/28/84 based on 9/17/84; site tekchips.UUCP Path: utzoo!linus!decvax!tektronix!tekcrl!tekchips!stevev From: stevev@tekchips.UUCP (Steve Vegdahl) Newsgroups: net.lang Subject: Re: Complex/Simple scanners - More hashing! Message-ID: <308@tekchips.UUCP> Date: Mon, 21-Oct-85 15:18:19 EDT Article-I.D.: tekchips.308 Posted: Mon Oct 21 15:18:19 1985 Date-Received: Thu, 24-Oct-85 06:52:21 EDT References: <2394@sjuvax.UUCP> Distribution: net Organization: Tektronix, Beaverton OR Lines: 29 > Do you think it would be faster for a compiler to have a more > complex scanner that could recognize key-words/reserved identifiers > immediately (i.e., simply by keeping a record of each inputted > character) and not have to hash and search for those key-words > each time one is encountered, or would it be better to keep the > scanner simple, recognizing each key-word as simply an identifier, > and then search the symbol table to determined wether or not > your identifier was reserved. > > It is not immediately obvious to me which one would be quicker, > although I have a tendency to think it may be the former. > Lex allows for one to design a scanner of the first type > very simply. If you ignore things like virtual memory, I'd think that the complex scanner would be faster if implemented in a straightforward way. However, this would increase the size of the state-transition table. Instead of having one or two character classes for letters, you'd have something like 26. I prefer to use hashing here because the set of reserved words is known at compiler-compile time. Thus, you can analyze the reserved words in advance and select a *perfect hash function* (or something close to it). The size of your hash table will be small, and you will be guaranteed to find any reserved word in a single probe of the table. Steve Vegdahl Computer Research Lab. Tektronix, Inc. Beaverton, Oregon