Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!watmath!clyde!ima!johnl From: johnl@ima.UUCP Newsgroups: mod.compilers Subject: Re: YACC for dynamic grammars Message-ID: <353@bms-at.UUCP> Date: Thu, 5-Feb-87 00:58:58 EST Article-I.D.: bms-at.353 Posted: Thu Feb 5 00:58:58 1987 Date-Received: Sat, 7-Feb-87 15:38:58 EST References: <458@ima.UUCP> Sender: johnl@ima.UUCP Organization: Business Management Systems, Inc., Fairfax, VA Lines: 71 Approved: compilers@ima.UUCP Summary: Yes! It is possible! For a simple dynamic grammar in YACC: Define operator tokens as follows: %left OL1 %right OR1 %left OL2 %right OR2 %left OL3 %right OR3 etc. The lexer for 'C' already has to lookup names in the symbol table to see if they are typedef_names or identifiers. You will now lookup operator names also to see what class (OL1, etc.) they are currently. The symbol table is loaded initially with standard operators and precedences. The currently defined class is returned as the lexical token for yacc, and the actual operator name is returned as the value (for semantic analysis). For even more generality, the lexer should recognize three character classes: alpha, whitespace and operator. Name tokens are alpha characters seperated by whitespace or operators. Operator strings are sequences of operators seperated by alpha. Operator strings are divided into tokens by finding the largest left substring which is currently in the symbol table (repeat as required). All tokens are looked up in the symbol table to decide if they are a) reserved names (e.g. register) or operators (e.g. '{'); return appropriate token. b) type names; return TYPENAME token. c) identifiers; return IDENTIFIER token. d) operators (both names and operators!); return class as token. The symbol table is preloaded (from a control file, perhaps) with the standard 'C' reserved words, types, and operator hierarchy. The user can #include custom grammars for various applications: complex arithmetic complex polynomial arithmetic string processing etc. Even reserved words can be redefined! The compiler need only hardwire a minimal subset of reserved words. Operators can be bound to functions or 'C-macros' based on operator name and operand types. Standard types are symbolic and used only for binding purposes. C-macros are machine dependent functions expanded inline by the code-generator, all other operators are function calls (or inline expansions). A function call can be replaced by a C-macro in a specific implementation for efficiency purposes. The first cut of a code generator can do everything except function calls with function calls! The various operators can then be added as C-macros as time permits. (Notice that argument passing for function calls needs to recognize types.) A similar approach will allow the user to define his own control structures. The key performance issue here is the symbol table - it has got to be fast, because it gets consulted by the lexer for every token! Unless the programmer excerises some restraint, a language of this nature will result in cryptic code! Although having the lexer check the symbol table for every token is more expensive than a fixed table driven lexer, notice that the preprocessor has to look up every token anyway. By combining the 'C' preprocessor with lexical analysis, you can provide a dynamic grammar and speed up compilation at the same time! Every token would be looked up by the lexer to determine if it is a) a preprocessor symbol b) a typedef name c) a reserved word d) an identifier e) an operator and apropriate tokens passed to YACC. -- Stuart D. Gathman <..!seismo!dgis!bms-at!stuart> [It's certainly true that you can set up a yacc-based parser quite easily that lets you add new tokens that behave syntactically just like other existing tokens using this scheme. I believe that the intent of the original question, though, was to add whole new productions to the grammar on the fly as is possible with a parser using Earley's algorithm. The consensus seems to be that since the yacc grammar tables are constructed only after an extensive global analysis of the grammar, incremental changes are unlikely to be possible without remaking the tables from scratch. -John] -- Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | cca}!ima Please send responses to the originator of the message -- I cannot forward mail accidentally sent back to compilers. Meta-mail to ima!compilers-request