Path: utzoo!attcan!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!wuarchive!uunet!gistdev!flint From: flint@gistdev.gist.com (Flint Pellett) Newsgroups: comp.lang.c Subject: A way to significantly speed up compiles? Keywords: cpp Message-ID: <1037@gistdev.gist.com> Date: 15 Dec 90 00:14:45 GMT Distribution: comp Organization: Global Information Systems Technology Inc., Savoy, IL Lines: 88 This is a proposal on how to make C compiles a lot faster, but it may not be a useful enough case to be worth it for typical C programs. I thought I'd throw the idea up for discussion though. Consider the following sample of code: #define STARTLOC 0 <..10 things..> #define ENTRY1 STARTLOC + 10 <..5 things..> #define ENTRY2 ENTRY1 + 5 <..7 things..> #define ENTRY3 ENTRY2 + 7 ... int pointers[] = {STARTLOC,ENTRY1,ENTRY2,ENTRY3... I expected the output of cpp to look like this: int pointers[] = {0 , 10 , 15 , 22 , ... cpp, however, is only doing simple text substitution here, and produced: int pointers[] = {0 , 0+10 , 0+10+5 , 0+10+5+7 , ... If you're trying to use MSC5.1 it doesn't even work: after you get somewhere around ENTRY60, it quits, telling you you've tried to nest your constants too deeply. The questions is, why not have cpp combine those numbers at pre-processor time? Wouldn't it make a lot more sense to have cpp add 0+10+5+7 together once when ENTRY3 is #defined in cpp rather than making the compiler add those numbers together 100 times in the 100 places that ENTRY3 is used? I can toss some real-world experience in to help answer that question, since my company has it's own specialized language that we develop and market: I made a similar improvement to our pre-processing so that it combined everything it could when it was defined rather than when it was used, and the result was that compile times were reduced to as little as 50% of previous times in some cases, and generally to 80% to 90% in all others. However, that language is one in which you tend to want to build up definitions of one thing based on a previous one, and I haven't seen that same tendancy nearly so much in C. Consider these factors: 1. What is worst case? The constant is defined and never used, so you end up doing the evaluation for nothing. How often do you realistically expect to see that happen? The second worst case: the constant is defined and then only used once, in which case you have merely moved the job of evaluating it to cpp from the compiler, and the slow-down of having cpp look at is going to be relatively minor. (cpp only needs to evaluate it if it finds an operator in it somewhere. The cpp evaluation can quit immediately upon finding anything like a variable that it can't handle at compile time, and fall back to simple text replacement.) 2. What is the best case? If you have an app that is chaining constants like those above, the difference snowballs, because cpp can even take advantage of it's own previous results. Assume 100 constants, each defined in terms of addition of a modifier to the one before. (The reason I'm bringing this up now is because I just bumped into a real-world case where this was done.) Those 100 constants could be computed with 100 additions by cpp. Done the way they are done now, the compiler has to perform 5,050 additions, not to mention parsing a lot more text, assuming it is even willing to nest things 100 levels deep. (It was trying to port it to MSC that I found out that not a compilers will.) 3. By combining the values, cpp ought to be able to significantly reduce the memory it consumes: it won't have to save all those long strings, it can save the reduced results. The obvious question is, how much will this really save in a typical C program? I don't know the answer to that one: if real programs don't use #define to define anything except simple integers in the first place, then not much. However, if you're build up defined masks by combining other defined masks, or are are building a table of pointers where each one is relative to the one before, potentially quite a bit. Of course, I've got one other question to the ANSI experts: I've got a program on my hands that isn't portable, because the place where it was written let you nest constants > 150 levels deep, and MSC won't. I don't remember ever seeing anything about stuff like that in a portability guide. Does ANSI conformance put any requirements on what cpp has to be able to handle on things like this? -- Flint Pellett, Global Information Systems Technology, Inc. 1800 Woodfield Drive, Savoy, IL 61874 (217) 352-1165 uunet!gistdev!flint or flint@gistdev.gist.com