Path: utzoo!attcan!uunet!munnari.oz.au!bruce!goanna!ok
From: ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe)
Newsgroups: comp.arch
Subject: Re: Bloat costs
Message-ID: <3160@goanna.cs.rmit.oz.au>
Date: 5 Jun 90 12:12:54 GMT
References: <26798@eerie.acsu.Buffalo.EDU> <266576A7.6D17@tct.uucp> <266A93A8.528F@tct.uucp>
Organization: Comp Sci, RMIT, Melbourne, Australia
Lines: 49

In article <266A93A8.528F@tct.uucp>, chip@tct.uucp (Chip Salzenberg) writes:
> According to merriman@ccavax.camb.com:
> >In fact, a lot of our maintenance headaches are caused by scrimping
> >on resources ("Why use a longword here instead of a word?

> Such "scrimping" is often done in the name of saving memory.  However,
> there are smarter ways to save memory, such as keeping only one record
> of a file in memory instead of the whole file, etc.

Just on the subject of bloat, tradeoffs, &c, there is an interesting
tradeoff in the way COFF format stores relocation information.  Precisely
in the interests of keeping small amounts of data in memory, it more than
doubles the size of a data structure held on disc.

Each address in a segment that needs to be relocated has a triple
	[address:long, symindex:long, type:short]
stored for it in that segment's relocation table.  If you examine a
typical object file, you find that most of the references are to
_repeated_ symindex values (e.g. every time you call printf() you get a
relocation triple pointing to printf).  This is an obvious candidate for
compression:  store
	[symindex:long, 0type:short, address:long] -- for unique references
	[symindex:long, 1type:short, count:short   -- for repeated references
		{,address:long}...]
which change would reduce the size of the relocation table by nearly 60%,
and would save repeated references to the symbol table.  Looks like a win
all around.  (Alternatively, we might forget about singleton references,
and make everything [symindex,type,count,address*] and benefit from having
everything longword aligned.  Another tradeoff.)

The relocation information is actually stored in increasing order of
address, so fixing up the addresses requires one sequential pass over
the segment and one sequential pass over the relocation table.  There
are accesses all over the symbol table, but the symbol table had to be
read into memory anyway.  COFF's data structure means that the linker
doesn't have to hold the whole segment it is relocating all in memory,
as the "transposed" structure would.  That was clearly a Good Idea on
PDP-11s.  Maybe with a virtual memory machine it isn't a good idea any more.
The tradeoff here was that with a small memory (64k) it was quite likely
that the program and data wouldn't all fit into memory at once; the data
were made _bigger_ so that they could be got at easily.  With a large
memory it's unlikely that the linker and an object segment can't fit
together, so it would make sense to save the disc space.

So, while I heartily agree that you can get 80% of the power of Emacs in
50k of code, let's not forget that _small_ memories can warp things too.
-- 
"A 7th class of programs, correct in every way, is believed to exist by a
few computer scientists.  However, no example could be found to include here."