Path: utzoo!attcan!uunet!nems!mimsy!chris
From: chris@mimsy.umd.edu (Chris Torek)
Newsgroups: comp.lang.c
Subject: Re: main() arguments, was Re: typedef-ing an array
Message-ID: <25273@mimsy.umd.edu>
Date: 3 Jul 90 18:32:34 GMT
References: <78627@srcsip.UUCP> <78633@srcsip.UUCP> <25247@mimsy.umd.edu> <12433@sun.udel.edu> <4238@jato.Jpl.Nasa.Gov>
Organization: U of Maryland, Dept. of Computer Science, Coll. Pk., MD 20742
Lines: 179

In article <25247@mimsy.umd.edu> I wrote:
>>[main()] must (yes must) be `int main', even if it never returns.
>>It may have either 0 arguments or two (int argc, char *argv).

In article <12433@sun.udel.edu> toor@sun.udel.edu (Kartik S Subbarao)
writes:
>I beg to differ.  [void main(dumdum)] works fine with gcc, and ...
>works fine with plain 'ol cc.
>So you CAN have a) void main if you desire,
>		b) only one argument to main.

Your universe is too small.  If I said

	int main() { if (*(char *)1024 == 0) return 1; return 0; }

works fine with both gcc and cc (hint: it does in fact work fine with
both ... on SOME machines), would you say that you are allowed to read
location 1024?

In article <4238@jato.Jpl.Nasa.Gov> kaleb@mars.jpl.nasa.gov (Kaleb Keithley)
writes:
>K&R 2nd Ed. states (p. 26):
>A function need not return a value. [...] Since main is a function like any
>other, it may return a value to its caller...
>
>Furthermore, on p. 164 (Ibid.) it is stated:
>Within main, return expr is equivalent to exit(expr).  exit has the
>advantage...

However, you might also note that it does NOT say that you may make
main a void function if main uses exit() rather than return.  There is
a reason for this.

>If exit() is used rather than return, I submit that declaring main as 
>returning type void is not only legal, but correct, as lint plus ANSI
>compilers will complain that there is no return statement.

They may indeed complain, but they will be incorrect in so doing.  The
ANSI C standard X3.159-1989 is very carefully designed to allow machines
to use a different call/return mechanism for functions that return values
versus functions that do not return values.  For instance, on a machine
with no registers, the code for a function `int f(x) { return x+1; }'
might be, e.g.,

		.export	f_
	f_:
		sub	#2,sp		| create local stack space
		| stack layout (2 byte `int's):
		|	4(sp)	arg 1
		|	2(sp)	pointer to return value location
		|	0(sp)	return pc
		|	-2(sp)	scratch space
		mov	4(sp),-2(sp)	| copy value of x
		add	#1,-2(sp)	| compute x+1
		mov	-2(sp),@2(sp)	| `return' it
		add	#2,sp		| undo stack
		ret

Note what happens on this machine if we say

	extern void exit(int);
	extern int foo(int);
	int main(int argc, char **argv) {
		(void) foo(argc == 1 ? 0 : 1);
		exit(0);
	}

This compiles to, e.g.,

	main_:	.export	main_
		sub	#2,sp		| create stack space, as before
		|	6(sp)	argv
		|	4(sp)	argc
		|	2(sp)	place to store return value from main
		|	0(sp)	return address (C library startup code)
		|	-2(sp)	scratch space
		mov	4(sp),-2(sp)	| copy argc to temp space
		sub	#1,-2(sp)	| subtract 1
		jnz	-2(sp),L1	| branch if -2(sp)!=0, i.e., argc!=1
		mov	#0,-2(sp)	| argument to foo is 0
		jmp	L2		| merge
	L1:	mov	#1,-2(sp)	| argument to foo is 1
	L2:	sub	#4,sp		| foo has a return value
		mova	2(sp),0(sp)	| foo's return value will be stored in
					| the location we used for the argument
		call	foo_		| call foo()
		add	#4,sp		| fix stack
		mov	#0,-2(sp)	| argument to exit is 0
		sub	#2,sp		| exit has no return value
		call	exit_
		add	#2,sp		| (compiler thinks exit returns)
		ret			| ... without return a value.

Now watch what happens if we declare main() as void:

	void main(int argc, char **argv) { foo(argc == 1 ? 0 : 1); ... }

compiles to:

	main_:	.export	main_
		sub	#2,sp		| create stack space, as before
		|	4(sp)	argv
		|	2(sp)	argc
		|	0(sp)	return address (C library startup code)
		|	-2(sp)	scratch space
		mov	2(sp),-2(sp)	| copy argc to temp space
		sub	#1,-2(sp)	| subtract 1
		jnz	-2(sp),L1	| branch if -2(sp)!=0, i.e., argc!=1
		mov	#0,-2(sp)	| argument to foo is 0
		jmp	L2		| merge
	L1:	mov	#1,-2(sp)	| argument to foo is 1
	L2:	sub	#4,sp		| foo has a return value
		mova	2(sp),0(sp)	| foo's return value will be stored in
					| the location we used for the argument
		call	foo_		| call foo()
		add	#4,sp		| fix stack
		mov	#0,-2(sp)	| argument to exit is 0
		sub	#2,sp		| exit has no return value
		call	exit_
		add	#2,sp		| (compiler thinks exit returns)
		ret			| ... without return a value.

Unfortunately for us, the thing that calls main() with argc and argv
put a return value pointer into 2(sp), so that we called foo() with
the result of `return value pointer == 1 ? 0 : 1' rather than with
the result of `argc == 1 ? 0 : 1'.  If we try to examine the value of
argv, we find instead the value of argc.  Indeed, we could write our
main as

	void main(int *ret, int argc, char **argv) {
		foo(argc == 1 ? 0 : 1);
		exit(0);
	}

and it would work---even though it would ALSO work when written with
two arguments as `int main(int argc, char **argv)'.
		
>Second bone to pick is the assertion that main() has two arguments (???)
>Since when?  What about the third allowable argument; envp?

It is not allowed (see your .signature quote :-) ).  If you are trying
to write a portable program, you must not use this invisible third argument.

>I know that both UNIX and DOS (M'soft C compilers anyway) support
>char **envp ... as the third parameter to main.

If you are building a machine on which two-argument functions are very
different from three-argument functions (think along lines similar to the
weird machine I described above, where no-value functions are very
different from value-functions), and you want to support UNIX, you will
have to write a compiler that `knows' how main() really works, and
internally converts the (standard, portable, correct) main

	int main(int argc, char **argv) { ...

into object code that appropriately understands and ignores the invisible
third argument.

>"So that's what an invisible barrier looks like"

Indeed: compiler magic, or `all done with mirrors'.


There is an `exercise for the reader' hidden above as well.  Given
the sample code for main() above, can the compiler avoid using a
separate temporary for `ret' in

	int sum(int n, int *vec) {
		int i, ret = 0;
		for (i = 0; i < n; i++)
			ret += *vec++;
	}

by accumulating the return value in @2(sp)?  Why or why not?  If not,
how would you change the code for main so that it could?  Under what
conditions would so doing actually be advantageous?
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris