Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10 5/3/83; site sequent.UUCP Path: utzoo!watmath!clyde!akgua!mcnc!decvax!ittvax!dcdwest!sdcsvax!bmcg!cepu!trwrba!trwrb!sdcrdcf!hplabs!tektronix!ogcvax!sequent!merlyn From: merlyn@sequent.UUCP Newsgroups: net.unix-wizards Subject: Re: Comment recognition in Lex, again Message-ID: <480@sequent.UUCP> Date: Fri, 4-May-84 10:30:16 EDT Article-I.D.: sequent.480 Posted: Fri May 4 10:30:16 1984 Date-Received: Tue, 8-May-84 03:32:34 EDT References: <245@uwvax.ARPA> Organization: Sequent Computer Systems, Portland Lines: 50 > From: anderson@uwvax.UUCP > Subject: Comment recognition in Lex, again > Message-ID: <245@uwvax.ARPA> > > I have received several replies to my request for a lex expression > to recognize /* ... */ comments. The only one that works (sent in > by Jim Hogue) is > > "/*"([^*]*"*"*"*"[^/*])*[^*]*"*"*"*/" > > which I can't claim to fully understand. Nor do I understand why my > original, "/*"([^*]|("*"/[^/]))*"*/", doesn't work. The idea is that > each character in the string between /* and */ can either be something > other than *, or * followed by something other than /. I looked at this expression for a while (translated it into railroad tracks so I could study it as an FSM). It's sound, but utterly complex. That is to say, it will match everything that is considered a C-comment, and nothing else. My previous suggestion (deleting the "/" before the "[^/]") fails for cases of /***/, because the second * and the third * are matched in the middle of the parthensized expression, leaving no * to use with the trailing /. Actually, all you have to do is document this :-). If you want simplicity, here's the way I do it (with start states!) #### %s INCOMMENT #### #### "/*" { #### BEGIN INCOMMENT; #### } #### (.|\n) { #### /* ignore */; #### } #### "*/" { #### BEGIN INITIAL; #### } Works just fine, and is VERY clear. "INITIAL" is an undocumented state that represents the start state that you are in to begin with. If you have any other single char matchers in your lex script, make sure they are AFTER the middle pattern above, or are prefixed with another start state (even INITIAL, if you don't need any other start states). Randal L. ("(null)") Schwartz, esq. (merlyn@sequent.UUCP) (Official legendary sorcerer of the 1984 Summer Olympics) Sequent Computer Systems, Inc. (503)626-5700 (sequent = 1/quosine) UUCP: ...!XXX!sequent!merlyn where XXX is one of: decwrl nsc ogcvax pur-ee rocks34 shell teneron unisoft vax135 verdix