Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.1 6/24/83 (MC830713); site hwcs.UUCP Path: utzoo!linus!vaxine!wjh12!genrad!decvax!harpo!seismo!cmcl2!floyd!vax135!ukc!edcaad!hwcs!chris From: chris@hwcs.UUCP (Chris Miller) Newsgroups: net.unix-wizards Subject: Re: Comment recognition in Lex, again Message-ID: <94@hwcs.UUCP> Date: Fri, 18-May-84 06:13:45 EDT Article-I.D.: hwcs.94 Posted: Fri May 18 06:13:45 1984 Date-Received: Tue, 15-May-84 02:12:56 EDT References: <245@uwvax.ARPA> Organization: Computer Sci., Heriot-Watt U., Scotland Lines: 42 The following is a fully general comment recogniser for /* ... */ comments in 'lex' - I have used definitions to make it a little more readable (I just can't cope with things like ("*"[^*]*)!). It should be pointed out that I don't believe that this is the RIGHT way to handle comments unless it is essential to retain their text; comments can be very long, and trying to match them with 'lex' can easily overflow buffers. I prefer solutions which match the opening /* and then throw away the rest of the comment in the action routine, using a bit of ad hoccery. ____________________________________________________________________ STAR "*" SLASH "/" CSTART ({SLASH}{STAR}) CUNIT ([^*]|{STAR}+[^/*]) CBODY ({CUNIT}*) CEND ({STAR}+{SLASH}) COMMENT ({CSTART}{CBODY}{CEND}) %% {COMMENT} printf("COMMENT '%s'\n", yytext); %% yywrap() { exit(0); } main() { for (;;) yylex(); } ____________________________________________________________________ One problem with the original non-working version is that it fails for comments terminated by an EVEN number of asterisks and a /. This seems to be a common bug in distributed compilers, etc, even when they don't use 'lex' for token generation. I have encountered this bug in several C compilers and their corresponding lints (of course, since lint usually uses cpp), and also in the original distribution of CProlog - you may find it entertaining to try out /** This is a legal comment **/ on any language systems which OUGHT to accept it. The fix is almost always trivial - the problem comes from reading the character following an asterisk without subsequently putting it back in the input if it happens to be another asterisk. Chris Miller Heriot-Watt Computer Science ...ukc!edcaad!hwcs