Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!csd4.milw.wisc.edu!bbn!apple!vsi1!wyse!mips!prls!philabs!ttidca!hollombe From: hollombe@ttidca.TTI.COM (The Polymath) Newsgroups: comp.lang.c Subject: Re: Want a way to strip comments from a C file Message-ID: <4060@ttidca.TTI.COM> Date: 15 Mar 89 23:15:59 GMT References: <7150@siemens.UUCP> <880@m10ux.UUCP> Reply-To: hollombe@ttidcb.tti.com (The Polymath) Distribution: usa Organization: The Cat Factory Lines: 33 In article <880@m10ux.UUCP> mnc@m10ux.UUCP (Michael Condict) writes: }I recently posted to this group a shell script that calls three sed scripts }to extract function prototypes from (practically) any C source. One of the }three sed scripts consisted of little more than comment removal -- exactly }what you are looking for. Here is the relevant portion: } }[...] }The disclaimers are that (1) it only works with BSD-derived sed, unless you }get rid of all the comments; and (2) it will fail for programs that contain }the extremely unlikely "Used#to%be..." strings used as markers in the script. If I understood the original posting correctly, it will also fail if it encounters a /* or */ within a quoted string constant. E.g.: char *msg1 = "The symbol \"/*\" begins a comment in C. \n"; char *msg2 = "The symbol \"*\\\" ends a comment in C. \n"; I deliberately added the escaped double-quotes to show that true, safe comment detection and removal isn't a trivial problem. There are probably a number of other "special" cases that can cause a simple, scan-for-/*, scan-for-*/ algorithm to fail. }This has been tested on thousands of lines of source code from various sources, }but no guarantees. You get what you pay for. Sound advice. -- The Polymath (aka: Jerry Hollombe, hollombe@ttidca.tti.com) Illegitimati Nil Citicorp(+)TTI Carborundum 3100 Ocean Park Blvd. (213) 452-9191, x2483 Santa Monica, CA 90405 {csun|philabs|psivax}!ttidca!hollombe