Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!wuarchive!zaphod.mps.ohio-state.edu!pacific.mps.ohio-state.edu!linac!att!cbnewsh!gls From: gls@corona.ATT.COM (Col. G. L. Sicherman) Newsgroups: comp.lang.c Subject: Unixworld competition - SOLUTIONS Message-ID: <1991Apr2.012639.25454@cbnewsh.att.com> Date: 2 Apr 91 01:26:39 GMT Sender: gls@cbnewsh.att.com (Col. G. L. Sicherman) Organization: Save the Dodoes Foundation Lines: 142 As I promised, here are the bugs in the prize-winning comment- stripping programs. First the C program: #include char *sccsID="@(#) cstrip.c 1.1 Bart J. Besseling, 8/90"; int m[9][8] = { /* finite-state machine */ /* events: / * " ' \ \n sp ch states: */ { 0x01,0x80,0x85,0x87,0x80,0x80,0x80,0x80 }, /* 0: hunt */ { 0x02,0x33,0xc0,0xc0,0xc0,0xc0,0xc0,0xc0 }, /* 1: maybe */ { 0x02,0x02,0x02,0x02,0x02,0x80,0x02,0x02 }, /* 2: c++ */ { 0x13,0x14,0x13,0x13,0x13,0x83,0x83,0x13 }, /* 3: c */ { 0x10,0x13,0x13,0x13,0x13,0x83,0x83,0x13 }, /* 4: end c */ { 0x85,0x85,0x80,0x85,0x86,0x80,0x85,0x85 }, /* 5: string */ { 0x85,0x85,0x85,0x85,0x85,0x85,0x85,0x85 }, /* 6: \ in str */ { 0x87,0x87,0x87,0x80,0x88,0x80,0x87,0x87 }, /* 7: char */ { 0x87,0x87,0x87,0x87,0x87,0x87,0x87,0x87 }, /* 8: \ in char */ }; int main() /* Input parser and output generator */ { register int ch, event, state; for (state = 0; (ch = getchar()) != EOF;) { /* translate character into event */ switch (ch) { case '/': event = 0; break; case '*': event = 1; break; case '"': event = 2; break; case '\'': event = 3; break; case '\\': event = 4; break; case '\n': event = 5; break; case '\t': case ' ': event = 6; break; default: event = 7; break; } /* obtain next state and operation from machine */ state = m[state & 0x0f][event]; /* perform operation */ if (state & 0x10) putchar(' '); if (state & 0x20) putchar(' '); if (state & 0x40) putchar('/'); if (state & 0x80) putchar(ch); } return 0; } The transition matrix has an erroneous entry that resets the automaton after two asterisks. The program will fail to terminate any comment that ends in "**/", such as /* This compiles, though it shouldn't. **/ IDENTIFICATION DIVISION. /* What's a COBOL statement doing here? */ main() {printf("hello, world\n");} Ian Collier found the bug and told me so. If you found the bug and didn't tell me, that's all right too. Now the lex program: %Start CODE CCOM STRING CHAR CPLUS %% %{ char *sccsID = "@(#) sc 1.0 Andre van Dalen, 6/90"; BEGIN CODE; %} ([^\\]\")|(\\\\\") | ([^.\\]\')|(\\\\\') | \n { ECHO; BEGIN CODE; } "*/" { two_space(); BEGIN CODE; } . { output(*yytext=='\t'?'\t':' ');} "/*" { two_space(); BEGIN CCOM ; } "//" { two_space(); BEGIN CPLUS ;} \" { ECHO; BEGIN STRING; } \' { ECHO; BEGIN CHAR; } . { ECHO; } %% two_space() { output(' '); output(' '); } main(argc, argv) int argc; char **argv; { if (argc==1) yylex(); else while (*++argv) { fclose(yyin); if (!(yyin=fopen(*argv,"r"))) { perror(*argv); exit(1); } yylex(); } exit(0); } This one doesn't handle multiple backslashes, though lex has the power to do so easily. A program like this will break it: main() { char *str = "This string has everything \\\" /* and more!\n"; printf(str); } Finally, the shell program: # @(#) sc Strip comments from a C/C++ source file # Author: Carl Bergerson, August 1990 # set -x # Uncomment for debugging # Define correct usage message: USAGE="Usage: $0 [sourcefile]" case $# in 0) sed -e 's/^#/a#/' | /lib/cpp | sed -e '/^#/d' -e 's/^a#/#/';; 1) sed -e 's/^#/a#/' $1 | /lib/cpp | sed -e '/^#/d' -e 's/^a#/#/';; *) echo $USAGE >&2 exit 1 ;; esac Even assuming that /lib/cpp is a C++ preprocessor that strips // comments, this can be broken with a little ingenuity: /* * play music on your home computer */ main() { printf("press the return key to hear Mozart's sonata in \ a# "); getchar(); play(); } The script uses "a#" as a flag, but it is not a safe flag. -- G. L. Sicherman gls@corona.att.COM