Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!think!mit-eddie!genrad!decvax!ima!johnl From: johnl@ima.UUCP (Compilers mailing list) Newsgroups: mod.compilers Subject: Query about token mapping Message-ID: <204@ima.UUCP> Date: Fri, 5-Sep-86 10:50:41 EDT Article-I.D.: ima.204 Posted: Fri Sep 5 10:50:41 1986 Date-Received: Fri, 5-Sep-86 21:46:50 EDT Reply-To: decvax!astrovax!sutin (Brian Sutin) Lines: 73 Approved: I would like to know if anyone has heard of the following algorithm, or knows how to write it, or has any ideas about how it could be written. Say you have N numbers, {X}, which you know ahead of time, and you want to assign them the integers {1-N} in any order, but in a unique fashion. Numbers not in {X} can be assigned anything, although it would be nice if they were assigned to, say 0, or N+1. This is nice for parse tables, or for case statements where the {X} cover a large spread and the compiler uses an if-else tree. What I want is a program which writes the above program. The subroutine: alg(Y) for( i = {1-N} ) if( Y = X[i] ) return i is of order N operations, probably 3 or 4 N. Better is a binary tree: alg(Y) if( Y = X[N/2] ) return N/2 else if( Y < X[N/2] ) if( Y = X[N/4] ) return N/4 else if( Y < X[N/4] ) ... else if( Y > X[N/2] ) if( Y = X[3N/4] ) return 3N/4 ... which is of order 2logN operations, but longer. Also, there is the lookup table of size max{X}. Yuk. None of these are optimal. Now we get to the meat of the problem. I bet that there is an algorithm which will find a set of binary operations, such as binary and, or, not, xor, etc, which, when handed a Y, will return {1-N}. This would be a black box with about logN operations, and would be optimal for both speed and space. For example: X = { 0, 3, 5 }. then alg(Y) return X & 3 will map ( 0 -> 0 ), ( 3 -> 3 ), ( 5 -> 1 ). 2 was skipped, but oh well. Anyway, any ideas? Brian Sutin Department of Astrophysical Sciences/Princeton University { princeton, ... }!astrovax!sutin [This problem is known as perfect hashing. The traditional application is taking a token from a program you're compiling and seeing if it's one of the reserved words. More formally, a perfect hash function for a set of tokens maps exactly one of the tokens into each of a set of hash buckets. Computing perfect hash functions is hard -- it takes minutes of computer time on something like a PDP-11. Once you have the hash function, using it is O(1) because you take a token, hash it, compare it to the token that belongs in the bucket hashed to, and either it's that token or it's not in your set. There has been a fair amount written on both the form of perfect hash functions and methods of computing them, typically by fitting some sort of polynomial. Do any of you readers know of good references? I can't find any right now. -John] -- Send compilers mail to ima!compilers or, in a pinch to Levine@YALE.EDU Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | bbncca}!ima Please send responses to the originator of the message -- I cannot forward mail accidentally sent back to compilers. Meta-mail to ima!compilers-request