Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ames!ig!arizona!rupley From: rupley@arizona.edu (John Rupley) Newsgroups: comp.lang.c Subject: Re: Want a way to strip comments from a Summary: Lex is a way; test file attached Keywords: comment stipping Lex test file Message-ID: <9778@megaron.arizona.edu> Date: 18 Mar 89 01:38:19 GMT References: <7150@siemens.UUCP> <9900010@bradley> <4896@cbnews.ATT.COM> <978@philmds.UUCP> <2131@mister-curious.sw.mcc.com> Reply-To: rupley@arizona.edu (John Rupley) Followup-To: comp.lang.c Distribution: usa Organization: U of Arizona CS Dept, Tucson Lines: 61 In article <2131@mister-curious.sw.mcc.com>, loo@mister-curious.sw.mcc.com (Joel Loo) writes: > In article <978@philmds.UUCP>, leo@philmds.UUCP (Leo de Wit) writes: > > And how about: > > puts(" A comment /* in here */"); > > And you can give more examples showing it isn't that trivial; a challenge > > for the sed adept, perhaps ... > > Leo. > [And a lot of previous articles on the same topic] > > The problem is: sed and vi do not understand C syntax. > > Solution: write a lex program to strip comments. The program must > understand C syntax enough to know what is a comment and what is not. > > Encouragement: it should not be too difficult. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ It isn't. Six lines of Lex source (not counting initialization) are enough. A Lex source for ``uncomment'' has been posted in comp.sources.unix, as part of: Subject: Volume 16 (Ends January 17, 1989) identlist List identifiers and declarations for C sources Attached is a minimum test for an uncommenting algorithm, including tests for quotes inside and outside comments. John Rupley uucp: ..{uunet | ucbvax | cmcl2 | hao!ncar!noao}!arizona!rupley!local internet: rupley!local@megaron.arizona.edu (O) Dept. Biochemistry, Univ. Arizona, Tucson AZ 85721 - (602) 621-3929 ---------------------------------------------------------------------------- /* * tests for ``uncomment'' * assume C-code conventions: * strings start and end on one line * comments can be multi-line * no tests for varieties of: '"' \'"\' etc * no tests for strings with newline escaped */ string4 "hi /*\"hi there*/there\"" comment1 /*one"*/"*/ comment2 /*\"hi there"*/"*/" comment3 /*\"hi there*/ comment4 /* hello/*hello/*hello/*hello*/ comment5 /*******/ comment6 /*/*/ a /**/ b /***/ c /****/ d /*////*/ comment7 /*/*// a /**// b /***// c /****// d /*////*// 1. /*****//"hello world */" ok /"hello world */" 2. /* hello /* /* world */ ok 3. /* */ hello /* */ ok hello 4. /**// /* this should produce "/ \n" for output */ ok / 5. /* */ hello */ ok hello */ 6. /*/*/ hello ok hello 7. /*////*/ ok 8. /*//*/ ok 9. abc = "/* fake comment"; /* got who ? */ ok abc = "STRING"; 10. /* "start quote "then next line end quote, after more characters than on line 1" more more more */ " ok " ----------------------------------------------------------------------------