Xref: utzoo comp.unix.questions:28853 comp.unix.programmer:1132 comp.compilers:1724 Path: utzoo!utgpu!news-server.csri.toronto.edu!bonnie.concordia.ca!thunder.mcrcim.mcgill.edu!snorkelwacker.mit.edu!usc!wuarchive!uunet!world!iecc!compilers-sender From: cl@lgc.com (Cameron Laird) Newsgroups: comp.unix.questions,comp.unix.programmer,comp.compilers Subject: You, too, can look at strings. Keywords: C, lex Message-ID: <1991Feb20.150204.3815@lgc.com> Date: 20 Feb 91 15:02:04 GMT References: <1991Feb12.144738.11530@lgc.com> Sender: compilers-sender@iecc.cambridge.ma.us Reply-To: cl@lgc.com (Cameron Laird) Organization: Landmark Graphics Corp., Houston, Tx Lines: 90 Approved: compilers@iecc.cambridge.ma.us I asked for help extracting string constants from source code. I summarize the responses I received: 1. my own was to write (approximately) echo 's/"[^"]*$/"/ s/[^"]*"/"/' >/tmp/string_script grep '".*"' | tee /tmp/string_list | \ sed -f /tmp/string_script | ... rm /tmp/string_script as part of a filter. The filter does these things: a. puts a grep-listing (not egrep, not fgrep, but grep) of all lines with at least two "-s into /tmp/string_list, for my later convenience in examining the contexts where the strings occur; and b. copies what's left of those lines after throwing away everything before the first " and after the last " to stdout. This was something I knew how to write in a few minutes, and works well enough, although it is ignorant nothing about the syntax of C beyond looking for a pair of "-s. 2. various folks suggested combinations of {m,}xstr--available on uunet:bsd-sources/pgrm/{m,}xstr/* I thought this had possibilities, but didn't work with it much. cxref I didn't find any quick way to make this do something useful to me. strings--this was definitely not what I had in mind (I'm thinking about source code, and, as far as I'm concerned, strings is for work- ing with object files), but I've invoked strings hundreds of times for other chores, and I'm happy to give it a bit of publicity. 3. a few folks wrote to say that perl could do it in one line; no one delivered such a line, but I didn't ask. Does perl remind anyone else of APL? That's not entirely a bad thing ... 4. comp.compilers publishes each month sites for distribution of lexical analyzers and such. I haven't checked this list. I also received the advice that, "At site primost.cs.wisc.edu (128.105.2.115) in directory /pub/comp.compilers are files called *grammar.Z They contain grammars for lex/yacc for c, c++ ftn and pascal. . . ." 5. a Swedish HPUX user reported that he relies on findstr, in the NLS (Natural Language Support) package that is part of HPUX. 6. William A. Hoffman posted the kind of lapidary answer I expected from the net: a couple dozen lines, definitive (in some sense), no-nonsense, functional, and a starting-point for yet more re- finements (or arguments). ... string.lex -------------------------------------------------------- string \"([^"\n]|\\["\n])*\" %% {string} printf("%s\n", yytext); return(1); \n ; . ; %% main() { int i; while(i= yylex()) ; } yywrap() { } ------------------------------------------------------------ to run just: lex string.lex cc lex.yy.c -o string string < *.c The moderator noted that this deserved to be beefed up "... to handle character constants and comments ..." 7. One reader wrote that he'd send a finite-state machine which models C syntax as soon as he found his copy. I haven't heard from him since. I'll pass it along when it arrives. My apologies to Henry Spencer for misremembering his name as "Harry". Thanks, all. -- Cameron Laird USA 713-579-4613 cl@lgc.com USA 713-996-8546 -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.