Path: utzoo!utgpu!news-server.csri.toronto.edu!bonnie.concordia.ca!uunet!mcsun!ukc!acorn!mark From: mark@acorn.co.uk (Mark Taunton) Newsgroups: comp.unix.internals Subject: Re: Compressed executables Message-ID: <4743@acorn.co.uk> Date: 22 Jan 91 12:04:04 GMT References: <1991Jan15.204849@IASTATE.EDU> <118868@uunet.UU.NET> <3977@skye.ed.ac.uk> Reply-To: mark@acorn.co.uk (Mark Taunton) Organization: Acorn Computers Ltd, Cambridge, UK Lines: 93 In article <3977@skye.ed.ac.uk> richard@aiai.UUCP (Richard Tobin) writes: >In article <118868@uunet.UU.NET> rbj@uunet.UU.NET (Root Boy Jim) writes: >>I once suggested to Chris Torek that the kernel should execute >>compressed programs. He groaned. > >This has been done by Acorn in their RISC iX port of 4.3. The compression >is done in such a way that the file can still be demand paged. > >Perhaps someone from Acorn could give more details? Yes. Here they are: The machines on which RISC iX is currently implemented have "interesting" memory management hardware. Each bank of 4Mbytes of physical memory is handled by a MEMC (Memory Controller) chip in such a way that every physical page is always mapped to some logical page (yes, it *is* that way round). This is done entirely on-chip, using a CAM in which each entry controls one physical page: if the entry's contents (logical page number) matches the logical page address put out by the ARM CPU, then the associated physical page (identified by the index of the entry in the CAM table) is used for the data transfer, provided that the associated protection bits allow. In the technology used in 1986 when MEMC was first made the practical limit on CAM entries was 128. Hence the page size for a 4Mb physical memory chunk comes out at a whopping 32Kb. If you are prepared to use more MEMCs and have each control less memory, the page size goes down (to a minimum of 4Kb, for 512Kb/MEMC); of course this adds to system cost for several reasons so Acorn chose not to. Now the consequences of a 32Kb page size are in many respects unpleasant, but it is possible to turn some of these to advantage. In particular since the first RISC iX machine (the Acorn R140, launched in early 1989 for approx. 3.5k pounds) had only 50Mb of disc and we wanted to put as much as possible of 4.3 BSD onto it, we needed to do some shoehorning. The problem is exacerbated because ARM, in common with most RISC architectures, has fairly poor code density (though not as bad as some). The solution was to use a code compression technique which is tuned to the particular typical bit patterns of ARM instruction sequences and which achieves a reduction of around 50%. The saving in disc space comes by applying this compression technique on a per-page basis to demand paged executable files. Each page as stored on disc would normally occupy 32Kb or four disc blocks on an 8Kb blocksize filesystem, but compression reduces this to around 16Kb on average, taking up two or perhaps three 8Kb blocks. (With the newer R260 machine we moved to 4Kb blocks in the shipped filesystem, which further improves the average disc space saving.) The unused blocks in a page prototype simply become "holes". In conjunction with a simple shared library scheme, the actual disc occupancy of executables becomes quite tolerable. The compression operation - we call it "squeezing" - is applied to a normal demand-load format program as a separate operation (the linker produces normal uncompressed output, but may immediately invoke the squeeze program if desired). An unsqueeze program performs the inverse operation if required (e.g. for debugging purposes). Compressed executables comprise a header, the text, the data and an optional symbol table. The header includes a magic number to identify the compressed format and two tables of values for the decompression algorithm, and is zero padded to the next page boundary. Each page of text or data is compressed "in place", i.e. the compressed data starts at the same offset in the file that the uncompressed data did, and the symbol table if present also starts at a page-size offset. The kernel recognises compressed executables by the magic number at exec time, and reads in the decompression tables in the header at this point, via the buffer cache. The tables are typically not very big, and are held on disc in a compacted format which is expanded by the kernel. To save memory, since most programs use much less than 32Kb of stack, the expanded tables are kept in user space just before the actual process's user stack. To demand load a page, the kernel reads in the blocks containing the compressed code or data directly into the new page frame (not through the buffer cache) then applies the decompression algorithm on the data. Since the data starts at the same file address regardless of compression, there is no problem finding it. A checksum kept with the compressed data helps protect against problems such as the possibility, practically never seen, of a wayward program corrupting the decompression tables in its own stack area. The page read operation takes advantage of any adjacency of disc blocks (which with careful tuning of the filesystem as shipped happens most of the time) and of course holes take no time to load! The decompression algorithm is extremely quick, and the net result is that a page can be ready for use in less wall-clock time than it would have taken to read in a whole 32Kb of uncompressed data directly. So the technique saves both disc space and time, although of course if the page size were smaller these benefits would probably be outweighed by other considerations. ------------------------------------------------------------------------ Mark Taunton Email: mark@acorn.co.uk RISC iX Development Group Acorn Computers Ltd Cambridge, England.