Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!csd4.csd.uwm.edu!bionet!agate!usenet
From: hughes@math.berkeley.edu (Eric Hughes)
Newsgroups: comp.software-eng
Subject: Re: C source lines in file
Summary: Using flex to recognize comments
Message-ID: <1989Aug21.171017.27042@agate.berkeley.edu>
Date: 21 Aug 89 17:10:17 GMT
References: <6500@pdn.paradyne.com> <1658@naucse.UUCP>
Sender: usenet@agate.berkeley.edu (USENET Administrator;;;;ZU44)
Reply-To: hughes@math.berkeley.edu (Eric Hughes)
Organization: UC Berkeley Math Dept
Lines: 35
In-reply-to: jdc@naucse.UUCP (John Campbell)

In article <1658@naucse.UUCP>, jdc@naucse (John Campbell) writes:
>Anyway, here's a lex goodie I use to count comments, *exactly* what he
>wanted, right?  Note that the output is in lines of 'C' code, so you could
>look very productive if you counted those lines of code instead!
>
>OBTW, this comment recognizer works well enough for my style of commenting.
>It does not solve the general problem of recognizing ANSI 'C' comments with a
>regular expression.  A solution to that problem was posted a while back, but
>it's pretty ugly...

Flex, the lex replacement by Vern Paxson, has a wonderful capability
to recognize comments that does not require a large ugly regexp and will
not overflow the input buffer.  One makes an exclusive start condition
which represents the predicate "the input pointer is inside a comment."
Then the start and end of comment markers can be recognized separately.

This technique can also be use to recognize string and character
constants, and should be for a general purpose program, to eliminate
the possibility that a comment start marker appears inside a string.

Eric Hughes
hughes@math.berkeley.edu   ucbvax!math!hughes

------------cut here-------------
/* Small flex program to recognize C-style comments in text.  */

%x COMMENT 
%%
"/*"			BEGIN( COMMENT ) ;
.			ECHO ;
<COMMENT>"*/"		BEGIN( 0 ) ;
<COMMENT>"*"		|
<COMMENT>[^*\n]+	|
<COMMENT>\n		;
%%