Path: utzoo!attcan!uunet!pmafire!uudell!sequoia!execu!cs.utexas.edu!usc!zaphod.mps.ohio-state.edu!caen!uflorida!travis!tom From: tom@ssd.csd.harris.com (Tom Horsley) Newsgroups: comp.std.c++ Subject: Re: design by committee (was: templates and exceptions in g++?) Message-ID: Date: 2 Dec 90 02:47:29 GMT References: <1016@zinn.MV.COM> <1990Nov23.211727.2802@zoo.toronto.edu> <1990Nov25.161506.9659@tsa.co.uk> <533@taumet.com> Sender: news@travis.csd.harris.com Organization: Harris Computer Systems Division Lines: 49 In-reply-to: steve@taumet.com's message of 1 Dec 90 23:36:23 GMT >>>>> Regarding Re: design by committee (was: templates and exceptions in g++?); steve@taumet.com (Stephen Clamage) adds: steve> Our original straightforward implementation of trigraphs steve> caused a 15% slowdown of the compiler front end. We spent quite a bit steve> of time finding an efficient way to handle them, and reduced the steve> overhead to about 5%. Please note this affects every program ever steve> compiled, even ones which contain no trigraphs. I don't want to sound too insulting here, but I would say you have a seriously flawed design. I worked on a ANSI C scanner as a sort of academic exercise while trying to fully understand the way the macro processor works, and my scanner has no additional overhead to speak of even if you do use trigraphs. The key to making this work fast is recognizing that you have to examine each character in the buffer to classify it as you go along anyway. I used a -like array that marked "interesting" characters and embedded the check in a getc()-like macro. The macro normally returns the next character using inline code, but if an interesting character shows up it calls a subroutine to do additional processing. A '\0' character is interesting because I might have to re-fill the buffer, A '\\' character is interesting because it might be followed by a newline and both of them will have to be squeezed out (remember that a backslash followed by a newline has always been a special sequence you had to check for even before question-mark question-mark came along - the overhead for tri-graphs is no worse than this). With tri-graphs, '?' is now also an interesting character. Sticking an extra check for the ?? tri-graph sequence in the subroutine that is only invoked when an interesting character comes along does not cost that much extra (unless you have a LOT of question marks in your source code). The tricky part is making sure you go ahead and fill the buffer if you are within 4 characters of the end and handling the case of a line terminated by ??/ followed by a newline. When I do find something like a tri-graph or a \ newline, I squeeze them out and replace them with what really belongs there. The routine knows where the current token starts in the buffer, so it just shifts it right to take up the slack, then it returns the proper character and scanning continues normally. This allows me to handle the phases of translation which process tri-graphs and backslash newlines transparently in the GetNextCharacter macro while I am also busting up the source into tokens. I can also leave the tokens in the input buffer without wasting the time copying them around unless I have to do something like squeeze out a trigraph. -- ====================================================================== domain: tahorsley@csd.harris.com USMail: Tom Horsley uucp: ...!uunet!hcx1!tahorsley 511 Kingbird Circle Delray Beach, FL 33444 +==== Censorship is the only form of Obscenity ======================+ | (Wait, I forgot government tobacco subsidies...) | +====================================================================+