Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!decwrl!ucbvax!ICAEN.UIOWA.EDU!dbfunk From: dbfunk@ICAEN.UIOWA.EDU (David B Funk) Newsgroups: comp.sys.apollo Subject: Re: Why is a cmpexe file so big? Message-ID: <9010120905.AA03175@icaen.uiowa.edu> Date: 12 Oct 90 08:22:04 GMT Sender: daemon@ucbvax.BERKELEY.EDU Organization: Iowa Computer Aided Engineering Network, University of Iowa Lines: 130 In posting Harald Hanche-Olsen asks: > I just tried building a cmpexe file using the `xar' command. The > result is interesting: > > % xar -cv pw pw.a88k pw.m68k > Added element tagged m68k from 'pw.m68k' > Added element tagged a88k from 'pw.a88k' > % xar -tv pw > type offset size alignment last-modified tag > coff 0 5311 524288 90/09/16.10:27 a88k > coff 262144 5066 262144 90/09/16.10:40 m68k > % ls -l pw* > -rwxr-xr-x 1 hanche 270420 Oct 5 13:48 pw* > -rwxr-xr-x 1 hanche 5311 Sep 16 10:27 pw.a88k* > -rwxr-xr-x 1 hanche 5066 Sep 16 10:40 pw.m68k* > > If my arithmetic is not way off, we are talking a 2500% overhead here! > All right, I will admit I have been unfair here -- a test with a > larger program reveals that the overhead seems to be fairly constant, > i.e. about 250K. So for large programs the overhead isn't too bad > percantagewise, but it's obviously a killer if you try to build up a > large library of small cmpexe'd programs. Does anyone know why it > has to be this way? Or does it? Ought I to submit an APR on it? Actually things arn't really as bad as they look, "ls" just isn't telling you everything. A "cmpexe" file is sparse and may actually have lots of "empty space" inside of it that doesn't use up disk blocks. If you look at that "cmpexe" file via "/com/ld -a" you'll see the real answer: $ /com/ld -a pw sys type blocks current type uid used length attr rights name file cmpexe 20 270420 P prwx- pw 1 entry listed, 20 blocks used. So the amount of disk space actually used up is only 20480 bytes, it is just that there is 260180 bytes worth of "space" between the block numbers allocated for the first 10k bytes and the block numbers allocated to hold the last 10k bytes. This is done so that the data will fall on memory segment aligned boundaries which speeds up the run time loading process. Look at the "alignment" and "offset" fields in the "xar -tv" output. Thus these files don't actually "cost" you the disk space that "ls" seems to tell you they do. Try this little experiment: move the file into a new directory so that it is all by itself. Now do an "ls -l" and look at the "total" value at the top of the "ls" output. Note that it matches the "blocks used" output from "/com/ld". For another example of sparse files, look at almost any "mbx" type file. If you use the Apollo alarm server, look at its message mail-box file in /tmp: $ ls -l total 6 -rwxrwxrw-+ 1 dbfunk 104208 Oct 12 03:33 alarm_server.msg_mbx drwxrwxrwx 1 root 1024 Jun 9 01:49 layers -rw-rw-rwx+ 1 root 144 Oct 9 13:59 llbdbase.dat $ /com/ld -a Directory "/sys/node_data/tmp": sys type blocks current type uid used length attr rights name file mbx 4 104208 P prwx- alarm_server.msg_mbx dir nil 1 1024 P prwx- layers file unstruct 1 144 P prwx- llbdbase.dat 3 entries, 6 blocks used. It looks like that file has over 100k bytes allocated to it but it uses up only 4 disk blocks. The contents of `node_data/systmp contain other examples of these things. However there is one way that you can lose "big time" on this stuff. If you use any Unix type program, such as "cp", to read these type files, the empty space may be allocated and filled in with real disk blocks. For example: $ xar -tv garp type offset size alignment last-modified tag coff 0 3552 524288 90/08/16.01:59 a88k coff 32768 3390 32768 90/08/16.01:52 m68k $ /com/ld -a Directory "/test": sys type blocks current type uid used length attr rights name file cmpexe 14 36948 P prwx- garp 1 entry, 14 blocks used. $ /com/cpf garp gork $ /com/ld -a Directory "/test": sys type blocks current type uid used length attr rights name file cmpexe 14 36948 P prwx- garp file cmpexe 14 36948 P prwx- gork 2 entries, 28 blocks used. $ /bin/cp gork guck $ /com/ld -a Directory "/test": sys type blocks current type uid used length attr rights name file cmpexe 14 36948 P prwx- garp file cmpexe 38 36948 P prwx- gork file unstruct 38 36948 P prwx- guck 3 entries, 90 blocks used. Note that after the "/bin/cp" the blocks used by the file "gork" changed from 14 to 38, even though it was the source of the copy operation (if you use the "-o" flag on "cp" this problem is avoided). Dave Funk