Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84; site ima.UUCP Path: utzoo!decvax!ima!johnl From: johnl@ima.UUCP (Compilers mailing list) Newsgroups: mod.compilers Subject: Re: Generator of Lexical Analyzers, mini-review Message-ID: <159@ima.UUCP> Date: Sat, 12-Jul-86 19:05:37 EDT Article-I.D.: ima.159 Posted: Sat Jul 12 19:05:37 1986 Date-Received: Sat, 12-Jul-86 22:11:03 EDT References: <138@ima.UUCP> Reply-To: decvax!utzoo!henry Lines: 90 Approved: On reflection, I think these issues deserve a bit more comment, and I suspect that some aspects are of sufficiently general interest to make it a followup rather than a private reply. > ... one needs to pay close > attention to lexing. Have you ever seen LEX used for a production compiler? > Cc, pc, cpp don't use LEX, nor do most other frequently used compilers. At no time was I defending the use of LEX for production compilers. LEX's strong point is convenience and flexibility, which suits it well to things like experimenting with notation. For example, I tend to use LEX when I'm basically inventing a new language for some specialized job, and I have no particularly good idea of what it's going to look like. In this situation, efficiency is not my major concern. I would not use LEX for a production compiler, but that wasn't the issue. > "... lot of machinery ...". I like to look at time and space requirements > of code. The GLA generated lexer uses a lot less of these than a LEX lexer. This is an unrealistic comparison, since we have already agreed that LEX is unsuited to this application. Actually, when I said "lots of machinery", I wasn't referring so much to time and space as to the complexity of the human interface of the scanner generator. > I assume Henry's comment was really referring to the auxiliary GLA > software... Actually, the auxiliary software strikes me as the most valuable part of GLA, since (as Bob points out) much of it has to be done anyway, and doing it well is a hassle. My objection to GLA is that it's not clear to me that one really needs to top off this useful software with an elaborate and quite inflexible scanner generator. > >I have written lexical analyzers, including two for C.) > > Why did you write several lexers by hand? > Was it because LEX and regular expressions just did not fit the problem > at hand? or Time/Space? LEX could probably have handled the syntax, but since these were meant to be production-quality scanners, LEX's efficiency problems made it unacceptable. A contributing factor for the C lexers in particular was a desire to avoid dependencies on non-trivial Unix-specific utilities. > How long does it take to hand build a fast reliable lexer for an > arbitrary programming language? Not long, if you (a) bear in mind that programming languages use very stereotyped lexical forms -- they are not "arbitrary", and (b) work from an existing high-quality design rather than starting from scratch. Note that I did not advocate starting from scratch each time; I advocated re-using existing code, such as a "boilerplate" scanner. This is a much-neglected approach in problems which are (1) stereotyped, (2) too variable for a library function, and (3) too simple for a program generator. It doesn't necessarily generate a grade-A scanner, but it yields a B+ simply and quickly. > >... It's not even > >very versatile at handling programming languages; for example, it can't > >handle C's hexadecimal numbers or string continuations. > > ... There is no inherent limitations in GLA > that prevent recognition of C hex numbers, or strings that span lines. My point here was not that there is some sort of intrinsic limit, but that a piece of software which claims to be a generic lexer generator in fact cannot generate a lexer for a common, important, not-too-messy language. This actually is the heart of my objection to GLA: it makes me learn a fairly elaborate piece of machinery which can't cope with a very wide range of jobs. I strongly suspect that it was built for doing one particular language and variants thereon, and nobody took the trouble to generalize it. Despite negative implications earlier, I actually do believe that there is a place for a good programming-language-oriented lexer generator. However, the current GLA is not it. What's needed is something that is relatively straightforward to use (GLA strikes me as marginal here), consistently generates grade-A scanners (GLA probably does), and is flexible enough to handle most programming languages without drastic measures like hand-editing the resulting code (GLA falls down badly here). Anybody want to write one? Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,decvax,pyramid}!utzoo!henry -- Send compilers mail to ima!compilers or, in a pinch to Levine@YALE.EDU Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | bbncca}!ima Please send responses to the originator of the message -- I cannot forward mail accidentally sent back to compilers. Meta-mail to ima!compilers-request