Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!cs.utexas.edu!uunet!lotus!esegue!compilers-sender From: bart@videovax.tv.tek.com (Bart Massey) Newsgroups: comp.compilers Subject: Re: Hand-rolled parsers Message-ID: <1989Nov11.161055.9938@esegue.segue.boston.ma.us> Date: 11 Nov 89 16:10:55 GMT References: <1989Nov10.025412.4014@esegue.segue.boston.ma.us> Sender: compilers-sender@esegue.segue.boston.ma.us Reply-To: Bart Massey Organization: Tektronix TV Measurement Systems, Beaverton OR Lines: 92 Approved: compilers@esegue.segue.boston.ma.us In article <1989Nov10.025412.4014@esegue.segue.boston.ma.us> firth@sei.cmu.edu (Robert Firth) writes: > This addresses the question 'Why do people write recursive descent parsers by > hand, rather than using parser-generator tools such as lex and yacc?'. It > answers a slightly different question, namely, 'Why do I write RD parsers by > hand...', and as such represents a personal and biased view. I just thought it was important to correct what I believe are some fundamental misconceptions of the author regarding existing tools. These are, of course, *my* opinions and beliefs, so take them for what they're worth. > 1. Ease of construction. > > Given a decent grammar, I can write the basic RD parser in a couple of days, > and do some solid component testing along the way. Moreover, I can write it > in a reasonable language. The tools I've used, on the other hand, have made > the job much harder, for these reasons ... > > . very unhelpful and unforgiving error messages. such as the message > "12 shift/reduce conflicts" as the first level error message; the next > level is a 50 page trace of the parser generator internal states. Yeah, but given a grammar suitable for RD parsing, you won't *see* any of these error messages! I agree -- once I have a grammar suitable for RD parsing, it wouldn't take me any longer to write an RD parser than a LR parser using a parser generator. What this argument ignores is that many artificial languages are either LR(1) or nearly so *before any grammar transformations*, and that the parser-generator may do some simple transformations automatically and *virtually instantaneously*. I've tried to transform a non-trivial grammar for RD by hand -- I didn't enjoy it much. > . serious bugs in the lexer generator or parser generator, leading to any of: > infinite loops, core dumps, silent generation of code skeletons that > won't compile. Could happen. Using FLEX+BISON (how about PD-YACC, anyone?) I've never seen anything I would class as a serious bug. And in fact I wrote a 500-line lexer and 2000-line parser recently, without running across any bugs in either LEX or YACC (misfeatures, but no bugs :-)! > 2. Flexibility > > The hand lexer and parser are not committed to a single inadequate technique. > For example, an RD parser has no trouble distinguishing between the character > "'" as the start of an Ada character literal, and the character "'" as the > Ada attribute symbol. The average lexer generator cannot disambiguate these > cases. Another one I had trouble with was the difference between Pascal > "1.10" and "1..10". Note that one can always pass information from the parser back to the lexer using global variables to guide the lex. While this isn't encouraged, IMHO it is just as powerful and clean as embedding the pass-back in the RD parser. As for your second example, I'm confused -- isn't ".." a seperate token in your Pascal compiler? If so, you can give LEX and clones rules for "." (structure member), ".." (subrange), and "[0-9]*.[0-9]+" or some such (floating constant), and it will disambiguate in what I believe to be exactly the right way -- choose the longest matching prefix, or the earlier one on identical lengths! > As another example, when parsing an expression I can use good old operator > precedence techniques, and gain a lot of efficiency for a small loss of > robustness. YACC and clones, at least, support operator-precedence parsing as a natural part of the input -- it is traditional to use this for any expression-like construct in the input! Typing in a C expression lexer and parser is literally a 1-hour exercise -- you should try it sometime! ... > For the future - one day I'll give up these antique ways, and move to modern > technology. But that technology will at have to be based on something at > least as powerful as attribute grammars or von-Wyngaarden grammars; as far as > I can see, mechanical context-free parsing just doesn't hack it. If you're writing a commercial compiler for a large compiled HLL, I can understand where the extra investment of effort in a "custom" scanner and/or parser might conceivably be worth it. But given a finite investment of effort, I personally prefer to spend as little as possible of it on the best-understood part of formal language analysis -- scanning and parsing. LEX and YACC have so far helped me to do so, without incurring costs which were significant to me in my application. Bart Massey ..tektronix!videovax.tv.tek.com!bart ..tektronix!reed.bitnet!bart -- Send compilers articles to compilers@esegue.segue.boston.ma.us {spdcc | ima | lotus}!esegue. Meta-mail to compilers-request@esegue. Please send responses to the author of the message, not the poster.