Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!watmath!clyde!rutgers!seismo!munnari!moncskermit!goanna!yabbie!rcodi
From: rcodi@yabbie.rmit.oz (Ian Donaldson)
Newsgroups: comp.unix.wizards
Subject: Re: brk's zero-fill behavior on VAXen (useful undefined checks)
Message-ID: <363@yabbie.rmit.oz>
Date: Sun, 9-Nov-86 22:48:02 EST
Article-I.D.: yabbie.363
Posted: Sun Nov  9 22:48:02 1986
Date-Received: Tue, 11-Nov-86 01:01:23 EST
References: <7208@elsie.UUCP> <5142@brl-smoke.ARPA> <2447@hcr.UUCP>
Organization: RMIT Comm & Elec Eng, Melbourne, Australia.
Lines: 56
Summary: initializing to other than zero is more useful

In article <2447@hcr.UUCP>, mike@hcr.UUCP (Mike Tilson) writes:
> I'd like to point out that there is another very good reason to
> set newly allocated memory to a fixed value:  buggy programs are much
> less likely to exhibit non-deterministic behavior, which makes it
> much easier to fix problems.  If newly allocated memory were initialized
> with random values, then tracking down wild pointers, etc., would be much
> harder.

I might point out that initializing such memory with zero is less likely
to reveal bugs in a program than would be initializing with a constant
garbage value (eg: 0x3e).  Now, if a pointer was to be used that lived
in such memory, it would be: 0x3e3e3e3e, a value that will cause most
CPU's to give a bus-error or seg fault, because (1) if the pointer
is a pointer to an int, then it is an odd-address, causing many
cpus such as the 68000 to crap out; and (2) very few programs have
addresses that live up that high in their data or that low in their stack 
segments.  Initializing to zero will only cause machines that
disallow references to low-memory (eg: Sun's) to show up the error.

The CDC Cyber 170 series uses this concept to advantage with most languages;
since it has 60-bits (a silly number, I agree), it sets all 'bss' storage to
0600000000000004nnnnn, where nnnnnn is the address of the storage.  Since
pointers on the Cyber cannot exceed 131071 (0377777), any reference
to the data as a pointer will fail.  The 06 part is used so that the
hardware can trap any arithmetic operations on such data as overflow's.

Fortran, Pascal and several other languages use this to advantage to give
sensible post-mortem dumps, as it is always known with reasonable
probability which variables are undefined, since the address of the
variable is inside its contents.

Minnesota Pascal-4 uses all this to great advantage, as when run-time
tests are on, even stack-frames are initialized this way, making it
very easy to debug programs that use uninitialized variables.

Pointers declared in parts of the program where run-time-checks
are switched on are also physically larger than normal, to accomodate
extra information (the key) so that the pointer can be checked for
validity.  When a new() is done, a unique key is tacked on top of the object
allocated, that must match the key in the pointer referencing it, otherwise
a "pointer-invalid" run-time error occurs.

On the cyber, this is easy, since there are so many bits available in a word.

Perhaps for the sake of run-time checking available with languages such
as Pascal on a 32-bit machine you could sacrifice one state of
the 4G available to be classified as 'undefined'.  An obvious state is due
to 2's complement machines having an imbalance in the range of
signed numbers.  16-bit numbers go from -32768 to 32767.  You could
probably steal the -32768 for such checking without affecting too many
programs.  Similarly for 32-bit ints (0x80000000 I think?).

Pity you can't do a lot of this checking in C without breaking huge
amounts of code.  Therefore, Pascal++ :-)

Ian Donaldson.