Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!think.com!sdd.hp.com!wuarchive!uunet!mcsun!hp4nl!star.cs.vu.nl!kjb
From: kjb@cs.vu.nl (Kees J. Bot)
Newsgroups: comp.os.minix
Subject: Re: [source] #! in MM -- take 2
Message-ID: <10033@star.cs.vu.nl>
Date: 23 May 91 09:25:46 GMT
References: <klamer.674467423@mi.eltn.utwente.nl>
Sender: news@cs.vu.nl
Lines: 74

I'm posting my comments to Klamer's second try at an #! implementation in
MM to remind you about my implementation of #! that I posted on May 13.
So far, I have only received comments from Klamer on my version telling
me that it is slower than his, because it makes two more calls to FS.
Apart from being a little bit slower, using my version is still the easiest
way to fix the bugs in Klamer's version.

klamer@mi.eltn.utwente.nl (Klamer Schutte) writes:
>Here is the second version of my #!interpreter patch for mm/exec.c.
>This version has all known bugs fixed.

Except for not doing setuid and this other "feature".

>One feature (bug ???) remains: i keep alignment from the data argv[] and
>envp[] point to intact. There (migh ???) be a tradition of having this
>data in the form of strings with only 1 \0 in between.
>Where is the manual page for execve(2) ???? Or does POSIX(*) say anything
>about this?

I know of three places to look for the proper format of the initial stack:
- The old V7 manuals under exec(2), written when users were not considered
  too stupid to know such things.
- The source code of execve(2).
- The source code of ps(1).
The ps(1) source contains this interesting comment:
/*
 * Get_args inspects /dev/mem, using bufp, and tries to locate the initial
 * stack frame pointer, i.e. the place where the stack started at exec time.
 * It is assumed that the end of the stack frame looks as follows:
 *      argc    <-- initial stack frame starts here
 *      argv[0]
 *      ...
 *      NULL    (*)
 *      envp[0]
 *      ...
 *      NULL    (**)
 *      argv[0][0] ... '\0'
 *      ...
 *      argv[argc - 1][0] ... '\0'
 *      envp[0][0] ... '\0'
 *      ...
 *      [trailing '\0']
 * Where the total space occupied by this original stack frame <= ARG_MAX.
 * Get_args reads in the last ARG_MAX bytes of the process' data, and
 * searches back for two NULL ptrs (hopefully the (*) & (**) above).
 * If it finds such a portion, it continues backwards, counting ptrs until:
 * a) either a word is found that has as its value the count (supposedly argc),
 * b) another NULL word is found, in which case the algorithm is reiterated, or
 * c) we wind up before the start of the buffer and fail.
 * Upon success, get_args returns a pointer to the conactenated arg list.
 * Warning: this routine is inherently unreliable and probably doesn't work if
 * ptrs and ints have different sizes.
 */

I decided to go over Klamer's patch with a fine comb this time.  (I wish
someone would do that with my patch, with a mental -pedantic flag on.)

- ALIGN align to a multiple of 2, execve to a multiple of sizeof(char *).
- The interpreter is found relative to '/'.  (Move the first
  tell_fs(CHDIR, ...) inside the do loop.)
- Setuid bits on the script are still ignored.  (Wouldn't it be nice to
  allow people to explore the security risks of a setuid script?)
- Change 'know' to 'now' in patch_stack.  (-pedantic)
- The old argv[0][] is not removed from the initial stack.
- The ALIGN(len) is still at the wrong place.  Try moving only the strings
  by disp bytes, then move the pointers by argc*sizeof(char *) bytes.  Do
  an ALIGN(disp) just before the return.
- If stk_bytes is close to ARG_MAX then the last few environment variables
  may be truncated.
- Read_header returns 0 when there is nothing behind #!.
- The size_ok function may return something other than -100.
--
	                        Kees J. Bot  (kjb@cs.vu.nl)
	              Systems Programmer, Vrije Universiteit Amsterdam