Path: utzoo!mnetor!uunet!wyse!weitek!pyramid!prls!philabs!sbcs!nyit!michael From: michael@nyit.UUCP (Michael Gwilliam) Newsgroups: comp.lang.c Subject: Re: lex grammer for C comments Message-ID: <262@nyit.UUCP> Date: 4 Apr 88 16:49:45 GMT Organization: NYIT Computer Graphics Lab., Old Westbury, N.Y. Lines: 109 Summary: summery of replies regarding lex grammer for C comments NOTE: Sorry this reply took so long, but our phone line was out for a long time. ----- Well the information is back and I've summerized the replies. In case you forgot the question it is, "Can C comments be filtered out with LEX as regular expressions?" The answer is, "Yes, but it may not be a good idea." The reasons are... o It's nearly impossible to read. o An extended comment could over flow the buffer. The correct way of doing this seems to be: You could use states, something like this (I might have the syntax a bit wrong): "/*" { BEGIN comment; } . ; "*/" { BEGIN 0; } The problem is that this requires you to set up states for everything, which is a pain. Here's what I did -- built my own little automata inside the action for the "/*" pattern. This is stripped out of working code. "/*" { /* Comment. */ register enum { S_STAR, S_NORMAL, S_END } S; for (S = S_NORMAL; S != S_END; ) switch (input()) { case '\0': /* Complain about premature EOF? */ S = S_END; break; case '*': S = S_STAR; break; case '/': if (S == S_STAR) { S = S_END; break; } /* FALLTHROUGH */ default: S = S_NORMAL; break; } } (credit goes to rsalz) Another method uses states. %START Normal Comment %% { BEGIN Normal; } "/*" { ECHO; BEGIN Comment; } "*/" { ECHO; printf("\n"); BEGIN Normal; } \ | [^ \t\n*]+ | "*"/[^/] | . | \n { ECHO; } . | \n { } (credit goes to Tony Hansen) If you're hard set on doing this, a good reference seems to be... _Introduction_to_Compiler_Construction_with_Unix_, by Axel T. Schreiner and H. George Friedman, Jr., Prentice-Hall, 1985, on page 25 gives: "/*""/"*([^*/]|[^*]"/"|"*"[^/])*"*"*"*/". The reason that the expression I used was accepting nexted comments is that lex tries to match the largest case. Nested comments are not regular expression so they are hopeless without writting a little C code. I never really wanted to do them anyway, I guess I just didn't make myself clear. (Besides, I'm told they're not ANSI.) Thanks for all the help from... Erik Baalbergen Kjell Post MH Cox R. Nigel Horspool cmcl2!gondor!psuvax1!gondor!schmidt@uiucdcs (David E. Schmidt) cmcl2!harvard!pineapple.bbn.com!rsalz harvard!gsg!gsgpyr!lew@linus (Paul Lew) harvard!ll-xn!ames!sdcsvax!sdcc6.UCSD.EDU!ix426@linus (Tom Stockfisch) sbcs!mmintl!franka@pwa-b sbcs!pegasus!hansen@cbosgd and I hope to goodness I gave proper credit to everyone. michael