Path: utzoo!utgpu!utstat!jarvis.csri.toronto.edu!mailrus!ames!ig!arizona!rupley From: rupley@arizona.edu (John Rupley) Newsgroups: comp.lang.c Subject: Re: Want a way to strip comments from a Summary: Use Lex, if you only want to strip Message-ID: <9797@megaron.arizona.edu> Date: 20 Mar 89 10:34:57 GMT References: <7150@siemens.UUCP> <9900010@bradley> <4896@cbnews.ATT.COM> <3145@nunki.usc.edu> Organization: U of Arizona CS Dept, Tucson Lines: 65 In article <3145@nunki.usc.edu>, jeenglis@nunki.usc.edu (Joe English) writes: > I made a mistake in the comment-eating program I > posted yesterday -- it won't handle > /* something like *//* this. */ > Change the line in the '/' case from: > if ((ch = getchar()) == '*') { eatcomment(); ch=getchar(); } > to: > if ((ch = getchar()) == '*') { eatcomment(); ch=getchar(); continue; } > and it will work. If anyone's interested. It still doesn't work. It won't uncomment itself. Or the following line: '"' /* hi there */ '"' Or distinguish a correct string, with escaped newlines, "hi\ /*\*/ /**/\ there" from an incorrect string without the escapes. The point is not _whether_ one can write an ``uncomment'' in C, but how, and in what language, one can do it most simply. It is certainly right to use C if uncommenting is part of a larger design, as in cpp or ctags. But if the whole aim is to uncomment, then a pattern-handling language, such as Lex, is more appropriate. A few lines of Lex source do the job, and assuming familiarity with regular expression syntax, it is easy to write and understand, and hard to get the logic wrong. It should be doable with sed or awk, but probably not as easily, because they see a file as a stream of lines rather than characters. In C, the proper setting up of the switch and flags is not trivial, as the previous posting witnesses. A Lex source for uncommenting is attached (which I hope does not belie the remark above about hard to get the logic wrong :-). John Rupley uucp: ..{uunet | ucbvax | cmcl2 | hao!ncar!noao}!arizona!rupley!local internet: rupley!local@megaron.arizona.edu -------------------------------------------------------------------- %{ /* UNCOMMENT- */ /* regexp for comment recognition based on usenet posting by: */ /* Chris Thewalt; thewalt@ritz.cive.cmu.edu */ %} STRING \"(\\\n|\\\"|[^"\n])*\" COMMENTBODY ([^*\n]|"*"+[^*/\n])* COMMENTEND ([^*\n]|"*"+[^*/\n])*"*"*"*/" QUOTECHAR \'[^\\]\'|\'\\.\'|\'\\[x0-9][0-9]*\' ESCAPEDCHAR \\. %START COMMENT %% {COMMENTBODY} ; {COMMENTEND} BEGIN 0; .|\n ; "/*" BEGIN COMMENT; {STRING} ECHO; {QUOTECHAR} ECHO; {ESCAPEDCHAR} ECHO; .|\n ECHO; ---------------------------------------------------------------------------