Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!cs.utexas.edu!samsung!uakari.primate.wisc.edu!ames!ig!arizona!mike From: mike@cs.arizona.edu (Mike Coffin) Newsgroups: comp.lang.c Subject: Re: Re^2: Why nested comments not allowed? Message-ID: <18069@megaron.cs.arizona.edu> Date: 19 Feb 90 20:29:50 GMT References: <4320@daffy.cs.wisc.edu> Organization: U of Arizona CS Dept, Tucson Lines: 26 From article <4320@daffy.cs.wisc.edu>, by schaut@cat9.cs.wisc.edu (Rick Schaut): > I think you've missed the point. In compilers for languages that do not > allow nested comments the parser never see the comment at all. The comments > are eaten by the scanner (which is a much simpler part of the compiler than > is a parser). Essentially, any language that requires balancing characters > (e.g. the language of balanced parens) cannot be represented using regular > expressions, and regular expressions are the construct upon which scanners > are based. In short, a compiler for a language that doesn't allow nested > comments is _much_ faster than a compiler for a language that allows them. The last sentence doesn't follow from the rest of the paragraph. Scanners may be *based* on regular expressions, but the popular scanners (Lex, Flex, and friends) are not *restricted* to regular expressions. In fact, as people often have pointed out, parsing comments with regular expressions can be dangerous with some scanners because long comments will overflow fixed-sized buffers. A common work-around is to detect the beginning of a comment by a regular expression and call a function (in C, perhaps) to eat the rest of the comment. This avoids the buffer-overflow problems and makes it trivial to parse nested comments---just count the number of tokens and match them with tokens. Nothing slow about that. -- Mike Coffin mike@arizona.edu Univ. of Ariz. Dept. of Comp. Sci. {allegra,cmcl2}!arizona!mike Tucson, AZ 85721 (602)621-2858