Path: utzoo!utgpu!water!watmath!clyde!rutgers!ucsd!sdcsvax!sdcc6!ix426 From: ix426@sdcc6.ucsd.EDU (Tom Stockfisch) Newsgroups: comp.lang.c Subject: Re: LEX Keywords: LEX, comments, C, regular expression Message-ID: <3609@sdcc6.ucsd.EDU> Date: 4 Feb 88 04:58:23 GMT References: <260@nyit.UUCP> Reply-To: ix426@sdcc6.ucsd.edu.UUCP (Tom Stockfisch) Organization: University of California, San Diego Lines: 54 In article <260@nyit.UUCP> michael@nyit.UUCP (Michael Gwilliam) writes: > >.... When I >was writting the tokenizer using LEX and I got intrigued by a little >problem. Is it possible to write a regular expression that will >transform a /* comment */ into nothing? .... >So my question is, to all you experienced lex >users and compiler writers, can this be done? Or do I need to >use input() and other lex functions. [sorry for not emailing, I can't seem to get mail to Michael] I can't believe how hard this task is in regular expressions, when it is trivial to code by hand. I have found a solution which I think is correct, but it took several tries (see end of this posting). To convince yourself that a pattern is correct, I think you have to show two things 1. That the body between the "/*" and "*/" cannot possibly contain a "*/", 2. That the body can contain any other sequence of characters. If you come up with your own solution, be sure it works properly on the following input. 1. /*****//hello world */ 2. /* hello /* /* world */ 3. /* */ hello /* */ 4. /**// /* this input should produce "/ \n" for output */ 5. /* */ hello */ The following lex source should "elide" all legal comments, and pass all the rest thru to stdout. As requested, it does not use input(). --cut---- okslash ([^*/]"/"+) %% "/*""/"*([^/]|{okslash})*"*/" ; --cut---- Compile using lex comment.l; cc lex.yy.c -ll -- || Tom Stockfisch, UCSD Chemistry tps@chem.ucsd.edu