Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!cs.utexas.edu!samsung!uakari.primate.wisc.edu!ames!ig!arizona!mike
From: mike@cs.arizona.edu (Mike Coffin)
Newsgroups: comp.lang.c
Subject: Re: Re^2: Why nested comments not allowed?
Message-ID: <18069@megaron.cs.arizona.edu>
Date: 19 Feb 90 20:29:50 GMT
References: <4320@daffy.cs.wisc.edu>
Organization: U of Arizona CS Dept, Tucson
Lines: 26

From article <4320@daffy.cs.wisc.edu>, by schaut@cat9.cs.wisc.edu (Rick Schaut):
> I think you've missed the point.  In compilers for languages that do not
> allow nested comments the parser never see the comment at all.  The comments
> are eaten by the scanner (which is a much simpler part of the compiler than
> is a parser).  Essentially, any language that requires balancing characters
> (e.g. the language of balanced parens) cannot be represented using regular
> expressions, and regular expressions are the construct upon which scanners
> are based.  In short, a compiler for a language that doesn't allow nested
> comments is _much_ faster than a compiler for a language that allows them.

The last sentence doesn't follow from the rest of the paragraph.
Scanners may be *based* on regular expressions, but the popular
scanners (Lex, Flex, and friends) are not *restricted* to regular
expressions.  In fact, as people often have pointed out, parsing
comments with regular expressions can be dangerous with some scanners
because long comments will overflow fixed-sized buffers.  A common
work-around is to detect the beginning of a comment by a regular
expression and call a function (in C, perhaps) to eat the rest of the
comment.  This avoids the buffer-overflow problems and makes it
trivial to parse nested comments---just count the number of
<begin-comment> tokens and match them with <end-comment> tokens.
Nothing slow about that.
-- 
Mike Coffin				mike@arizona.edu
Univ. of Ariz. Dept. of Comp. Sci.	{allegra,cmcl2}!arizona!mike
Tucson, AZ  85721			(602)621-2858