Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!csd4.milw.wisc.edu!bionet!apple!oliveb!mipos3!omepd!merlyn From: merlyn@iwarp.intel.com (Randal Schwartz) Newsgroups: comp.lang.c Subject: the answer (I hope!) (was Re: regex for C comments) Message-ID: <4644@omepd.UUCP> Date: 13 Jul 89 15:50:55 GMT References: <19365@paris.ics.uci.edu> <502@chem.ucsd.EDU> Sender: news@omepd.UUCP Reply-To: merlyn@iwarp.intel.com (Randal Schwartz) Distribution: na Organization: Stonehenge; netaccess via Intel, Hillsboro, Oregon, USA Lines: 27 In-reply-to: tps@chem.ucsd.edu (Tom Stockfisch) In article <502@chem.ucsd.EDU>, tps@chem (Tom Stockfisch) writes: | So, who has the shortest single LEX expression that correctly | matches C comments -- | ignoring string and character constants, | and disallowing start conditions? | | Mine is | | "/*"\/*([^/]|{[^*/]\/+})*"*/" What are these curly brace things? They're not used in accordance with V7 (the One True Unix :-) Lex. OK, I'll toss my submission back in the ring, which I have tested (for once... :-). "/*"("*"*[^*/]+|"/"+)*"*"+"/" You could make it shorter by leaving off the two plus signs inside the parens, but this is probably more efficient. Ducking to avoid the onslaught of test cases, -- /== Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ====\ | on contract to Intel, Hillsboro, Oregon, USA | | merlyn@iwarp.intel.com ...!uunet!iwarp.intel.com!merlyn | \== Cute Quote: "Welcome to Oregon... Home of the California Raisins!" ==/