Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!rochester!pt.cs.cmu.edu!andrew.cmu.edu!bob# From: bob#@andrew.cmu.edu (Bob Sidebotham) Newsgroups: sci.lang,comp.lang.misc Subject: Re: Using C to describe itself Message-ID: <4UbsMNy00Vs6cnk0-Z@andrew.cmu.edu> Date: Wed, 6-May-87 14:51:05 EDT Article-I.D.: andrew.4UbsMNy00Vs6cnk0-Z Posted: Wed May 6 14:51:05 1987 Date-Received: Sat, 9-May-87 01:11:15 EDT Organization: Carnegie-Mellon University Lines: 53 Xref: mnetor sci.lang:593 comp.lang.misc:381 Return-path: X-Andrew-Authenticated-as: 9 X-Trace: MS Version 3.24 on ibm032 host carnot, by bob (9). To: outnews#ext.nn.sci.lang@andrew.cmu.edu, outnews#ext.nn.comp.lang.misc@andrew.cmu.edu In-Reply-To: <1336@frog.UUCP> >>>> Yes indeed C can describe itself. >>>Actually, I'm not completely convinced of this... >>>Evidence of this is in Dennis Ritchie's Turing award lecture: he had built a >>>trojan horse into the C compiler. If anyone is unconvinced by this argument, here are two real examples that I ran into years ago in a Pascal compiler and another compiler derived from Pascal. Both of these compilers compiled themselves, and both had hidden aspects to them which were completely invisible in the source code: Pascal had a concept of an "alfa", which was a string object holding a fixed maximum number characters, typically corresponding to the number of bytes available in a computer word. On different machines, this was defined differently. The predefined constant ALFALENG defined this maximum length. The compiler created a symbol table entry for this constant by something equivalent to: AddNumericConstant("ALFALENG", ALFALENG); where the first parameter is the name of the constant, and the second is the value. The only way to determine the actual value of ALFALENG was to examine the object code or to compile a program which printed it out. On different machines you got different values. Another, considerably more interesting example was a bug that we managed to introduce into the floating point constant recognition. Originally, there were no floating point constants in the compiler, the routine did a divide by 10 every time it came across another digit to the right of the decimal. Since the divide on our machine was relatively slow, someone replaced the divide by 10 with a multiply by 0.1. This appeared to work fine, until we discovered, much later, that the compiler's representation for 0.1 changed, by 1 bit, for each generation of the compiler. That is, you compiled the compiler, used the results to compile a new compiler, then did a byte for byte comparison of the two generated programs. They differed by one bit! It turned out that every second generation was identical. There's also the well known C-preprocessor hack: every C preprocessor defines a symbol which declares which machine type is being compiled for. On a vax, for example, you can test the symbol "vax" and conditionally compile code for it. It turns out that the preprocessor itself has identical source code for the various machine types, as: #ifdef vax Define the vax symbol #endif #ifdef ibm032 Define the ibm032 symbol #endif etc. Again, it's impossible to determine, purely from the source, which symbol will actually be defined.