Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.1 6/24/83; site whuxle.UUCP Path: utzoo!watmath!clyde!burl!ulysses!mhuxl!ihnp4!whuxle!mp From: mp@whuxle.UUCP (Mark Plotnick) Newsgroups: net.unix-wizards Subject: Re: Comment recognition in Lex, again Message-ID: <340@whuxle.UUCP> Date: Sat, 5-May-84 13:54:38 EDT Article-I.D.: whuxle.340 Posted: Sat May 5 13:54:38 1984 Date-Received: Sun, 6-May-84 01:31:24 EDT References: <245@uwvax.ARPA> <1876@utcsstat.UUCP> Organization: Bell Labs, Whippany Lines: 42 The problem with "/*"([^*]|("*"/[^/]))*"*/" is that the right context handling in lex in nested regular expressions is a little nonintuitive. After lex recognizes the complete expression, it backs up one character because of the ``/[^/]'' expression. In case you still don't see the problem, run this lex program: a(b/c)c { printf("I saw this: %s\n", yytext); } . { printf("char: '%c'\n", yytext[0]); The first rule will NOT match ``abc'', but it will match ``abcc'', sort of. It prints out ``I saw this: ab''. To be safe, only use right context at the very end of your regular expression. Yet Another Way To Recognize Comments: I really don't enjoy beating my head against a wall playing with regular expressions and starting conditions. When we had to write a compiler a couple of years ago (any other AM295 survivors out there?), we did something like: "/*" { #define LEXEOF 0 int c, last_c='\0'; while ((c=input()) != LEXEOF) { if (last_c == '*' && c=='/') break; else last_c=c; } printf("comment seen\n"); if (c == LEXEOF) printf("EOF within comment\n"); } Moving some of the effort into the action routine allows you to easily add more context-dependent features, such as printing a warning message if there's a ';' within the comment, supporting nested comments, etc. Mark Plotnick