Path: utzoo!attcan!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!world!iecc!compilers-sender From: preston@ariel.rice.edu (Preston Briggs) Newsgroups: comp.compilers Subject: Re: Hash specifics Keywords: design Message-ID: <1990Dec17.155858.28854@rice.edu> Date: 17 Dec 90 15:58:58 GMT References: <1990Dec12.221425.569@zoo.toronto.edu> <2980@lupine.NCD.COM> Sender: compilers-sender@iecc.cambridge.ma.us Reply-To: preston@ariel.rice.edu (Preston Briggs) Organization: Rice University, Houston Lines: 44 Approved: compilers@iecc.cambridge.ma.us plx!plxsun!evan@Sun.COM (Evan Bigall) writes: >The time when I always find myself looking for a perfect hash function, is >when I am parsing long keyword options (ie: -ansi etc...) or some other case >when I have a small set of possibilities but want to avoid using: > > if (!strcmp("hello", arg)) > {} > else if (!strcmp("there", arg)) > {} > else if (!strcmp("world", arg)) Another poster (Ira Baxter?) mentioned how to handle these efficiently in a typical compiler-like case. I'll expand a little. Since we're building a symbol table anyway, we're going to need to hash every identifier anyway. I use one hash table to hash keywords, identifiers, and strings as part of my scanner. Each time a new id or string is encountered, I allocate a new integer for it. If the integer is < some small number, an id is actually a keyword. The rest of the front-end simply looks at integers which serve as indices into a string table. Makes it easy to switch on "strings" when all are reduced to integers. This works especially well with Knuth's WEB system. WEB has a notion of preprocessed-strings. I.e. "a string" When Tangling, a string table is maintained and all the strings are replaced in the output text with integers. Further, the string table is written to a pool file, suitable for initializing your compiler's hash table (or TeX or whatever). Pre-processed strings are also a nice way to handle error messages and so forth, potentially saving much storage. LISPers will wonder what the fuss is about, as they use symbols as a matter of course. Preston Briggs -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.