Path: utzoo!utgpu!jarvis.csri.toronto.edu!clyde.concordia.ca!uunet!cs.utexas.edu!tut.cis.ohio-state.edu!ucbvax!henry.berkeley.edu!jbuck From: jbuck@henry.berkeley.edu (Joe Buck) Newsgroups: comp.lang.c++ Subject: Re: zortech problem with lex Message-ID: <34608@ucbvax.BERKELEY.EDU> Date: 1 Mar 90 00:59:18 GMT References: <6300008@ux1.cso.uiuc.edu> <24800002@sunb6> <25EC5CBF.26673@paris.ics.uci.edu> Sender: usenet@ucbvax.BERKELEY.EDU Reply-To: jbuck@henry.berkeley.edu.UUCP (Joe Buck) Organization: U.C. Berkeley -- ERL Lines: 46 In article <25EC5CBF.26673@paris.ics.uci.edu> schmidt@zola.ics.uci.edu (Doug Schmidt) writes: >> No, I don't think writing a lexer is hard. I did it many times >>before I even heard of lex (though I didn't know I was writing "lexers.") >>However, I can now write and debug a lexical analyzer in half a day max >>using flex, and anyone knowing lex/flex could easily maintain my code. >Have you ever tried writing a f?lex scanner for full ANSI C? There >are some surprising subtleties. Henry Spencer posted one to the net a >while back. Let me know if you'd like a copy to peruse. It's got >`easy to maintain' regular expressions like: >L?\'([^'\\\n]|\\(['"?\\abfnrtv]|[0-7]{1,3}|[xX][0-9a-fA-F]+))+\' >L?\"([^"\\\n]|\\(['"?\\abfnrtv\n]|[0-7]{1,3}|[xX][0-9a-fA-F]+))*\" This is as ugly as it is because Henry failed to use definitions. Any experienced user of flex would define a symbol to represent letters, a symbol to represent numbers, a symbol to recognize backslash escapes, etc, and build the above expressions out of the symbols. Only someone trying to slam the lex approach would write the above monstrosities. >>Therefore I doubt I will ever write another "lexer" by hand. >Well, if you don't end up working with language processing tools then >the point is moot... ;-) On the other hand, what if you write portable >programs for systems that lack f?lex? Do you have C? Flex generates C code, and is written in C, and is freely redistributable. > What if you are trying to write >a fast lexer in order to gain market share for your product? You certainly wouldn't want to use lex then, but flex is far faster, and other lexer generators are faster still. Check out Van Jacobsen's paper from the 1986 Winter Usenix, where he produced a lexer generator as fast (almost) as cat, and got the inner loop down to one 68000 instruction per character processed (anyone know if this version is ever going to be public?). Beat that with a hand-written lexer. Granted, Vern Paxton's flex program is not this good, but many people got turned off to lexer generators because lex is so horrible. flex is far superior to lex. The "f" stands for "fast", not free. For that reason, writing "f?lex" is unfair to flex, lumping it in with a very inefficient program. -- -- Joe Buck jbuck@janus.berkeley.edu {uunet,ucbvax}!janus.berkeley.edu!jbuck