Path: utzoo!attcan!uunet!munnari.oz.au!bruce!goanna!ok From: ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) Newsgroups: comp.arch Subject: Re: Bloat costs Message-ID: <3160@goanna.cs.rmit.oz.au> Date: 5 Jun 90 12:12:54 GMT References: <26798@eerie.acsu.Buffalo.EDU> <266576A7.6D17@tct.uucp> <266A93A8.528F@tct.uucp> Organization: Comp Sci, RMIT, Melbourne, Australia Lines: 49 In article <266A93A8.528F@tct.uucp>, chip@tct.uucp (Chip Salzenberg) writes: > According to merriman@ccavax.camb.com: > >In fact, a lot of our maintenance headaches are caused by scrimping > >on resources ("Why use a longword here instead of a word? > Such "scrimping" is often done in the name of saving memory. However, > there are smarter ways to save memory, such as keeping only one record > of a file in memory instead of the whole file, etc. Just on the subject of bloat, tradeoffs, &c, there is an interesting tradeoff in the way COFF format stores relocation information. Precisely in the interests of keeping small amounts of data in memory, it more than doubles the size of a data structure held on disc. Each address in a segment that needs to be relocated has a triple [address:long, symindex:long, type:short] stored for it in that segment's relocation table. If you examine a typical object file, you find that most of the references are to _repeated_ symindex values (e.g. every time you call printf() you get a relocation triple pointing to printf). This is an obvious candidate for compression: store [symindex:long, 0type:short, address:long] -- for unique references [symindex:long, 1type:short, count:short -- for repeated references {,address:long}...] which change would reduce the size of the relocation table by nearly 60%, and would save repeated references to the symbol table. Looks like a win all around. (Alternatively, we might forget about singleton references, and make everything [symindex,type,count,address*] and benefit from having everything longword aligned. Another tradeoff.) The relocation information is actually stored in increasing order of address, so fixing up the addresses requires one sequential pass over the segment and one sequential pass over the relocation table. There are accesses all over the symbol table, but the symbol table had to be read into memory anyway. COFF's data structure means that the linker doesn't have to hold the whole segment it is relocating all in memory, as the "transposed" structure would. That was clearly a Good Idea on PDP-11s. Maybe with a virtual memory machine it isn't a good idea any more. The tradeoff here was that with a small memory (64k) it was quite likely that the program and data wouldn't all fit into memory at once; the data were made _bigger_ so that they could be got at easily. With a large memory it's unlikely that the linker and an object segment can't fit together, so it would make sense to save the disc space. So, while I heartily agree that you can get 80% of the power of Emacs in 50k of code, let's not forget that _small_ memories can warp things too. -- "A 7th class of programs, correct in every way, is believed to exist by a few computer scientists. However, no example could be found to include here."