Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!mips!decwrl!sun-barr!lll-winken!ames!dftsrv!mimsy!chris From: chris@mimsy.umd.edu (Chris Torek) Newsgroups: comp.lang.c Subject: Re: stupid compilers Message-ID: <26336@mimsy.umd.edu> Date: 3 Sep 90 03:04:54 GMT References: <163@prodix.liu.se> <24700008@sunb0.cs.uiuc.edu> Organization: U of Maryland, Dept. of Computer Science, Coll. Pk., MD 20742 Lines: 130 In article <24700008@sunb0.cs.uiuc.edu> mccaugh@sunb0.cs.uiuc.edu writes: >... the question orignally posed did not address correctness of either >program, but rather why the second version precipitates a segmentation >fault where the first one does not. So what is the key difference in >the two programs? It would appear to be in the declaration of variable >'line' ... This much is correct. Quick recap: the program that caused a `segmentation fault, core dumped' result was of the form prog2> main() { char *line; strcpy(line, "foo"); } while the program that appeared to work was of the form prog1> main() { char line[]; strcpy(line, "foo"); } >which is a null (length = 0) string in the first declaration (char []) This is a slightly peculiar definition for `null string'. The program labelled `prog1' has a constraint violation: the subscript brackets in the declaration must not be empty. A buggy compiler allowed the empty declaration, and---since I happen to know the internal implementation of this compiler, I know what it did---treated it as `char line[0];', reserving zero bytes for the array `line'. >and a char-ptr in the second. We are not informed as to whether the >assignment (via 'strcpy') caused the problem or the subsequent 'printf' >but I would suspect the latter. You would suspect incorrectly. >(If the former caused the problem in [prog2], why not in [prog1]?) The actual generated code on a VAX for prog1 is (unoptimized but slightly simplified): _main: .word 0 # save no registers subl2 $0,sp # allocate 0 bytes of stack for line[] pushab L1 # push &"foo"[0] pushab (fp) # push &line[0] calls $2,_strcpy # call strcpy() ret # return from main, no value L1: .ascii "foo\0" # C string {f,o,o,\0} Compare this with a correct program in which line[] is declared as `char line[4];': _main: .word 0 # save no registers subl2 $4,sp # allocate 4 bytes of stack for line[] pushab L1 # push &"foo"[0] pushab -4(fp) # push &line[0] calls $2,_strcpy # call strcpy() ret # return from main, no value L1: .ascii "foo\0" # C string {f,o,o,\0} The only difference between these two programs at run time is what goes on the stack. Assume that the stack pointer sp in main() is 0x7fffeb80. (At the entry to a subroutine, the VAX makes sp==fp; fp is later used to mean `sp before we adjusted it with a subl2 or push instruction'.) In the first program, the `subl2 $0' does not affect sp at all; then we have a pushab that pushes, say, 0x1000 on the stack, and in the process changes sp to 0x7fffeb7c. Then we have a `pushab (fp)'; this pushes 0x7fffeb80 on the stack, in the process changing it to 0x7fffeb78. The `calls $2' then pushes 2, and then a register save mask, and some other stuff. strcpy() then copies {foo\0} (four bytes) to locations 0x7fffeb80 through 0x7fffeb83, and returns to main(). At this point the word `foo\0' has overwritten whatever used to be at 0x7fffeb80. The only question then is: what was there, and was it any use? As it happens, on the VAX, what was there was a 0, and it does not get used. In the corrected program, strcpy() copies into four bytes at 0x7fffeb7c.. 0x7fffeb7f, which were set aside for that purpose by the `subl2 $4,sp'. Program 2, on the other hand, compiles to code something like this: _main: .word 0 # save no registers subl2 $4,sp # make space for `line' pushab L1 # push &"foo"[0] pushl -4(fp) # push value of `line' calls $2,_strcpy ret L1: .ascii "foo\0" Again, on the VAX, this might start with sp=fp=0x7fffeb80. The subl2 would then set sp=0x7fffeb7c. The `pushl' instruction would then push the contents of locations 0x7fffeb7c..0x7fffeb7f. This is, on the VAX, normally preset to 0 (newly allocated stack pages are cleared so that programs cannot search memory for passwords, or unencrypted files from the last invocation of the editor, or whatever). Thus, this asks strcpy to copy {foo\0} into location 0. Location 0 is not writable, and the program gets a segmentation violation signal and crashes. >My point is that certain compilers MAY draw some >serious distinction between char-ptrs and "true" strings (char [*]) even >when the string is null. *Every* C compiler *must* draw a serious distinction between a pointer and an array. See the Frequently Asked Questions list for some of the differences. The two are not and never have been equivalent, and an array is never a pointer. An array *object* is *converted to* a pointer *value* in some places, and in one very special place an array *declaration* is *rewritten as* a pointer declaration. But an array never `is' a pointer. > Since un-initialized ptrs so often lead to segmentation faults, here is my >guess as to what happened. The first declaration (char line [];) must have >initialized variable 'line' as a char-ptr to some 0-length area, while the >second declaration (char *line;) left 'line' un-initialized. Hence the value >of 'line' in the first case was legitimate -- even if it addressed 0-length >space -- while the un-initialized "value" of 'line' in the second case could >not even be considered legitimate. Not quite. The empty brackets incorrectly passed through the compiler, leaving line[] as a zero-length area but *NOT* a `pointer to' a zero-length area. The pointer aspect only comes in when the array name (`line') is used in a value context (argument to strcpy); then the compiler changes the object to a value by computing the address of element 0 of the array. In other words, the analysis above derives one correct conclusion from false premises. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 405 2750) Domain: chris@cs.umd.edu Path: uunet!mimsy!chris