Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10 5/3/83; site sequent.UUCP Path: utzoo!watmath!clyde!akgua!sdcsvax!sdcrdcf!hplabs!tektronix!ogcvax!sequent!merlyn From: merlyn@sequent.UUCP Newsgroups: net.unix-wizards Subject: Re: Comment recognition in Lex, again Message-ID: <487@sequent.UUCP> Date: Mon, 7-May-84 12:55:39 EDT Article-I.D.: sequent.487 Posted: Mon May 7 12:55:39 1984 Date-Received: Sat, 12-May-84 07:15:16 EDT References: <245@uwvax.ARPA> <7666@watmath.UUCP> Organization: Sequent Computer Systems, Portland Lines: 45 > From: rjhurdal@watmath.UUCP > Message-ID: <7666@watmath.UUCP> > Date: Fri, 4-May-84 14:58:05 PDT > > The puzzles in net.unix-wizards are better than the ones in net.puzzle. > I blew two hours composing this: > "/*"[^*]*"*"+([^*/][^*]*"*"+)*"/" printf("<<<%s>>>", yytext) ; > and tested it on this input: > asdqwa/*/asaas*/werwer > sdf/**/sdfsdf/***/erwerwer > cvb/*tdfg*/*xcvcv*/werwe > ty/bcvb/*******/***/fssdf > and it appears to work. Please let me know if you come up with cases > where it doesn't work... I can't seem to break this one (even spent 20 minutes with a little "railroad-track" finite-state-machine model). However, it doesn't appear as elegant as a solution sent in a private message to me from Andrew Klossner @ tektronix: "/*"([^*]*"*"+[^/])*"*"*"/" This one I can't break either. (No claims for lack of human error, however.) The solution I submitted with start states (mail me if you didn't get that one) was preferred in another private communication for a reason that I had failed to notice at the time... all lex regular expressions which attempt to scarf up a comment in one fell swoop can overflow the fixed-size "yytext[]" array EASILY. Take, for example, a typical start-of-file log produced by the RCS $Log:$ stuff, or an in-source manpage (yes, I've seen them). Ick. Start states avoid that hassle. It was pointed out to me in that same private communication that any lex rules that are not qualified by a start state are STILL active inside the comments. Boo. I forgot about that. I've learned to prefix ALL my rules by start states (even if it is just ). I had forgotten that I do that regularly. Randal L. ("no comment") Schwartz, esq. (merlyn@sequent.UUCP) (Official legendary sorcerer of the 1984 Summer Olympics) Sequent Computer Systems, Inc. (503)626-5700 (sequent = 1/quosine) UUCP: ...!XXX!sequent!merlyn where XXX is one of: decwrl nsc ogcvax pur-ee rocks34 shell teneron unisoft vax135 verdix