Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!cs.utexas.edu!uunet!lotus!esegue!compilers-sender
From: bart@videovax.tv.tek.com (Bart Massey)
Newsgroups: comp.compilers
Subject: Re: Hand-rolled parsers
Message-ID: <1989Nov11.161055.9938@esegue.segue.boston.ma.us>
Date: 11 Nov 89 16:10:55 GMT
References: <1989Nov10.025412.4014@esegue.segue.boston.ma.us>
Sender: compilers-sender@esegue.segue.boston.ma.us
Reply-To: Bart Massey <bart@videovax.tv.tek.com>
Organization: Tektronix TV Measurement Systems, Beaverton OR
Lines: 92
Approved: compilers@esegue.segue.boston.ma.us

In article <1989Nov10.025412.4014@esegue.segue.boston.ma.us> firth@sei.cmu.edu (Robert Firth) writes:
> This addresses the question 'Why do people write recursive descent parsers by
> hand, rather than using parser-generator tools such as lex and yacc?'.  It
> answers a slightly different question, namely, 'Why do I write RD parsers by
> hand...', and as such represents a personal and biased view.

I just thought it was important to correct what I believe are some
fundamental misconceptions of the author regarding existing tools.  These
are, of course, *my* opinions and beliefs, so take them for what they're
worth.

> 1. Ease of construction.
> 
> Given a decent grammar, I can write the basic RD parser in a couple of days,
> and do some solid component testing along the way.  Moreover, I can write it
> in a reasonable language.  The tools I've used, on the other hand, have made
> the job much harder, for these reasons
...
>
> . very unhelpful and unforgiving error messages.  such as the message
>   "12 shift/reduce conflicts" as the first level error message; the next
>   level is a 50 page trace of the parser generator internal states.

Yeah, but given a grammar suitable for RD parsing, you won't *see* any of
these error messages!  I agree -- once I have a grammar suitable for RD
parsing, it wouldn't take me any longer to write an RD parser than a LR
parser using a parser generator.  What this argument ignores is that many
artificial languages are either LR(1) or nearly so *before any grammar
transformations*, and that the parser-generator may do some simple
transformations automatically and *virtually instantaneously*.  I've tried
to transform a non-trivial grammar for RD by hand -- I didn't enjoy it much.

> . serious bugs in the lexer generator or parser generator, leading to any of:
>   infinite loops, core dumps, silent generation of code skeletons that
>   won't compile.

Could happen.  Using FLEX+BISON (how about PD-YACC, anyone?) I've never seen
anything I would class as a serious bug.  And in fact I wrote a 500-line
lexer and 2000-line parser recently, without running across any bugs in
either LEX or YACC (misfeatures, but no bugs :-)!

> 2. Flexibility
> 
> The hand lexer and parser are not committed to a single inadequate technique.
> For example, an RD parser has no trouble distinguishing between the character
> "'" as the start of an Ada character literal, and the character "'" as the
> Ada attribute symbol.  The average lexer generator cannot disambiguate these
> cases.  Another one I had trouble with was the difference between Pascal
> "1.10" and "1..10".

Note that one can always pass information from the parser back to the lexer
using global variables to guide the lex.  While this isn't encouraged, IMHO
it is just as powerful and clean as embedding the pass-back in the RD parser.

As for your second example, I'm confused -- isn't ".." a seperate token in
your Pascal compiler?  If so, you can give LEX and clones rules for "."
(structure member), ".." (subrange), and "[0-9]*.[0-9]+" or some such
(floating constant), and it will disambiguate in what I believe to be
exactly the right way -- choose the longest matching prefix, or the earlier
one on identical lengths!

> As another example, when parsing an expression I can use good old operator
> precedence techniques, and gain a lot of efficiency for a small loss of
> robustness.

YACC and clones, at least, support operator-precedence parsing as a natural
part of the input -- it is traditional to use this for any expression-like
construct in the input!  Typing in a C expression lexer and parser is
literally a 1-hour exercise -- you should try it sometime!

...

> For the future - one day I'll give up these antique ways, and move to modern
> technology.  But that technology will at have to be based on something at
> least as powerful as attribute grammars or von-Wyngaarden grammars; as far as
> I can see, mechanical context-free parsing just doesn't hack it.

If you're writing a commercial compiler for a large compiled HLL, I can
understand where the extra investment of effort in a "custom" scanner and/or
parser might conceivably be worth it.  But given a finite investment of
effort, I personally prefer to spend as little as possible of it on the
best-understood part of formal language analysis -- scanning and parsing.
LEX and YACC have so far helped me to do so, without incurring costs which
were significant to me in my application.

					Bart Massey
					..tektronix!videovax.tv.tek.com!bart
					..tektronix!reed.bitnet!bart
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{spdcc | ima | lotus}!esegue.  Meta-mail to compilers-request@esegue.
Please send responses to the author of the message, not the poster.