Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84; site ima.UUCP Path: utzoo!decvax!ima!johnl From: johnl@ima.UUCP (Compilers mailing list) Newsgroups: mod.compilers Subject: Re: Generator of Lexical Analyzers, mini-review Message-ID: <138@ima.UUCP> Date: Tue, 24-Jun-86 11:30:42 EDT Article-I.D.: ima.138 Posted: Tue Jun 24 11:30:42 1986 Date-Received: Tue, 24-Jun-86 17:55:57 EDT Reply-To: decvax!allegra!boulder!bob (Bob Gray) Lines: 94 Approved: Thank you Henry for the mini-review on GLA. There are a couple of points that I would like to bring out. I'm not sure there was a need for you to be "impressed". GLA merely generates a scanner for programming languages that runs fast. I think the main point is 1) why did we build GLA, and 2) why use it. You answered both points: 1> It may be fast, but it's a lot of machinery for rather modest results. Well, it is fast. Let me give some motivation. Last fall I was comparing and measuring the speed/size of GAG generated compilers with hand written compilers. Based on rumors, I had expected that the semantic processing was going to be terrible for the GAG generated compilers. Well surprise, lexing and parsing took 75% of the analysis time (no codegen). It was the "easy" stuff that slowed down the GAG generated compiler! We replaced the GAG provided lexer and parser with an earlier GLA, and and a fast parser we were working on. The resulting analyzer (for pascal) ran faster than 4.2BSD pc, but not as fast as VMS-PASCAL. There is plenty of additional evidence that shows that one needs to pay close attention to lexing. Have you ever seen LEX used for a production compiler? Cc, pc, cpp don't use LEX, nor do most other frequently used compilers. People write their own lexers because at this phase of the compilation process, EVERY character has to be examined and you want this examination to go very fast. (Parsing looks at tokens, which occur 5-10 times less frequently than characters of input). "... lot of machinery ...". I like to look at time and space requirements of code. The GLA generated lexer uses a lot less of these than a LEX lexer. I assume Henry's comment was really referring to the auxiliary GLA software for source input, error reporting in the source coordinates, string memory (you don't want to use malloc), identifier table and keywords, and denotation value management. Both LEX and the hand approach require that you provide all of this yourself AND work out the non-trivial interfacing details to the parser. 2>The problem of scanning conventional programming languages just >isn't complex enough to justify a program-generator system. >I have written lexical analyzers, including two for C.) Now here is the meat of the discussion. You imply that scanning is not very complex (true), but does this mean that lexers should be constructed by hand? I definitely don't believe that just because something is easy (arithmetic for example), that we should ignore tools (calculators). Why not use whatever tools help accomplish a quality product. Furthermore, how many people know about or care to deal with the subtle issues of lexing such as compact case labels? Your lexer will run up to 25% slower if you "switch" on the input character. (Anyone interested should see W.M. Waite 'Cost of Lexical Analysis', Software P&E 16 1986). And what about reliability? The code to properly handle tabs, comments and floating point numbers is tricky. How about efficient keyword recognition? There are so many pitfalls. How long does it take to hand build a fast reliable lexer for an arbitrary programming language? I would much rather give a specification to a tool so I will have time to work on hard, interesting problems. >It is very much >oriented towards fairly conventional programming languages, and is not >a general-purpose substitute for LEX (I don't think they ever claimed >it was, but the point should be made more explicitly). It's not even >very versatile at handling programming languages; for example, it can't >handle C's hexadecimal numbers or string continuations. Yes, GLA is not a regular expression tool, but please acknowledge that one often has to step outside of regular expressions to handle common lexing problems. (Every see the contortions necessary to process a C style comment in LEX?) Why did you write several lexers by hand? Was it because LEX and regular expressions just did not fit the problem at hand? or Time/Space? There is no inherent limitations in GLA that prevent recognition of C hex numbers, or strings that span lines. It is just that we did not implement that stuff in the current version. It is not hard to take the produced lexer (in C code) and add things we have not implemented. >To my mind, the most useful part of the package would be several libraries >provided with it, encapsulating things like string storage and target- >machine integer arithmetic. Given the limited domain of application of >GLA, I suspect that a better approach would be to invest a bit more effort >on the libraries and supply a "boilerplate" C scanner invoking them. This IS the approach used, just don't trivialize the "boilerplate". GLA dynamically has to produce the boilerplate or else the resulting scanner will suffer in speed. Yes, GLA is not a exciting, complex tool- but it saves lots of tedious time when one needs a lexer. -bob gray bob@boulder.csnet {seismo!hao, ucbvax!nbires}!boulder!bob -- Send compilers mail to ima!compilers or, in a pinch to Levine@YALE.EDU Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | bbncca}!ima Please send responses to the originator of the message -- I cannot forward mail accidentally sent back to compilers. Meta-mail to ima!compilers-request