Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!tut.cis.ohio-state.edu!zaphod.mps.ohio-state.edu!brutus.cs.uiuc.edu!jarthur!uci-ics!gateway From: schmidt@zola.ics.uci.edu (Doug Schmidt) Newsgroups: comp.lang.c++ Subject: Re: zortech problem with lex Message-ID: <25EC5CBF.26673@paris.ics.uci.edu> Date: 28 Feb 90 23:20:32 GMT References: <6300008@ux1.cso.uiuc.edu> <24800002@sunb6> Reply-To: schmidt@zola.ics.uci.edu (Doug Schmidt) Organization: University of California, Irvine - Dept of ICS Lines: 94 In-reply-to: voss@sunb6.cs.uiuc.edu There are a number of confusions in your post. Let me see if I can shed some light on this. In article <24800002@sunb6>, voss@sunb6 writes: >I saw that comment about C++ not being LALR(1) in the g++ distribution, >and at the time believed it. However, my biggest thrill at OOPSLA '89 >was sitting at the same table as Bjarne Stroustrup for a few hours drinking >free beer. He said that he IS USING A LALR(1) GRAMMER for C++ 2.0. >He appeared sober & honest, therefore I assume that C++ is LALR(1), but >non-trivial. Your statement confuses the difference between an LALR(1) language (which C++ is *not*) and a particular set of tools used to recognize a language. What Bjarne probably meant (judging from the cfront 2.0 sources) is that he was using the LALR(1) parser generated by yacc to recognize C++. What you probably don't know is that cfront 2.0 also utilizes an infinite-lookahead lexical analyzer (hand-coded, BTW) that enables it to handle the non-LALR(1) parts of the language, e.g. disambiguating between certain type-decls and expressions. Incidentally, Jim Roskind (ileaf!io!bronx!jar@EDDIE.MIT.EDU) has released a beta version of his yacc-based grammar that accepts C++ 2.0 syntax *without* using an infinite lookahead lexer. As far as I know this grammar has not yet been incorporated in any production-quality C++ compiler or translator. Comments Jim? > No, I don't think writing a lexer is hard. I did it many times >before I even heard of lex (though I didn't know I was writing "lexers.") >However, I can now write and debug a lexical analyzer in half a day max >using flex, and anyone knowing lex/flex could easily maintain my code. Have you ever tried writing a f?lex scanner for full ANSI C? There are some surprising subtleties. Henry Spencer posted one to the net a while back. Let me know if you'd like a copy to peruse. It's got `easy to maintain' regular expressions like: L?\'([^'\\\n]|\\(['"?\\abfnrtv]|[0-7]{1,3}|[xX][0-9a-fA-F]+))+\' L?\"([^"\\\n]|\\(['"?\\abfnrtv\n]|[0-7]{1,3}|[xX][0-9a-fA-F]+))*\" >Therefore I doubt I will ever write another "lexer" by hand. Well, if you don't end up working with language processing tools then the point is moot... ;-) On the other hand, what if you write portable programs for systems that lack f?lex? What if you are trying to write a fast lexer in order to gain market share for your product? f?lex are extremely useful for certain purposes, but there are often pragmatic reasons why they aren't used in every circumstance. > I think the surprise is that a "commercial compiler writer" >would not be familiar with flex/lex. Why is that surprising? f?lex scanners are *not* generally used for production (i.e., commercial) compilers. And certainly not in a highly competitive market like MS-DOS! > I am not familiar with GPERF. What did you give up to >get the speed advantage? The trade-off is generality versus specificity. GPERF is a great solution for a particular (and limited) problem domain, i.e., implementing recognizers for static search sets. It doesn't provide anywhere *near* the flexibility of f?lex. De gustibus non disputandum est... >Is GPERF available via anonymous ftp? Where? You bet! (glad you asked... ;-)). The C++ and C version of GPERF is available via anonymous ftp from ics.uci.edu (128.195.1.1) in the ~ftp/pub directory: ---------------------------------------- -rw-r--r-- 1 ftp 120483 Feb 15 23:30 gperf-2.3.tar.Z -rw-r--r-- 1 ftp 97187 Nov 11 14:15 cperf-2.1.tar.Z ---------------------------------------- I'm also presenting a paper on GPERF at the upcoming USENIX C++ Conference in San Francisco. A preliminary draft is also available via ftp from ics.uci.edu: ---------------------------------------- -rw-r--r-- 1 ftp 43820 Feb 18 17:10 gperf.tex ---------------------------------------- I'd be interested in any comments people might have (there's still time to make revisions!) Doug -- The official language of San Marcos is Swedish. All boys | schmidt@ics.uci.edu under the age of sixteen years old are now sixteen years | office (714) 856-4043 old. Underwear must be changed every half hour. It will +---------------------- be worn on the outside, so we can check. -- `Bananas' by Woody Allen