Xref: utzoo comp.unix.wizards:23308 comp.unix.questions:24375 comp.unix.cray:161 Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!uwm.edu!zaphod.mps.ohio-state.edu!usc!apple!jrg From: jrg@Apple.COM (John R. Galloway Jr.) Newsgroups: ca.unix,comp.unix.wizards,comp.unix.questions,comp.unix.cray Subject: File size distribution survey Keywords: file size survey Message-ID: <43691@apple.Apple.COM> Date: 6 Aug 90 21:02:48 GMT Distribution: usa Organization: Galloway Research Lines: 127 Below you will find a short shell archive containing 4 scripts which when run will produce a historgram of the file size distribution (in 4K chunks) on your system (or some subtree thereof). I am interested in getting this info from big users (e.g. 2 or more GBytes of disk) and especially from really big users (10 or 50 or ? GBytes of disk). The restriction of course is that you must be using all that space for unix files, not a data base or some other application that reads/writes raw partitions since these scripts won't count them (you can attach their size in a note if you like) If you are willing to run it for me, please send me the resulting output along with a brief statement concerning the use of the file storage (e.g. general sw development on vax, super coputer simulation on cray xmp, graphics, CAD, etc.). It takes 7 minutes on my (tiny) Maciix A/UX system with 120MB in 10,000 files. Hopefully your mialage will be better, but still you might want to do this in an off peak period. THANKS!! please use jrg@galloway.sj.ca.us for the return file. or ..fernwood!galloway!jrg -jrg #! /bin/sh ## This is a shell archive. Remove anything before this line, then unpack ## it by saving it into a file and typing "sh file". To overwrite existing ## files, type "sh file -c". You can also feed this as standard input via ## unshar, or by typing "sh consolidate.awk <<'END_OF_consolidate.awk' XBEGIN {bs=4096;size = bs;number=0;printf("# of files of 4K blocks\n")} X{if ( $1*512 <= size ) number=number+$2; Xelse X{ printf("%8d %d\n", number, size/4096); Xwhile ($1 * 512 > size) size=size+bs; number=$2 }} XEND {printf("%8d %d\n", number, size/4096)} END_OF_consolidate.awk if test 269 -ne `wc -c count.awk <<'END_OF_count.awk' XBEGIN {match=1;count=0} X{if ($1 == match) count++; else {print match,count;count=1;match=$1}} XEND {print match,count} END_OF_count.awk if test 118 -ne `wc -c fsize.awk <<'END_OF_fsize.awk' X{if (NF > 2) printf "%.0f\n" , ($5 + 4095) / 4096} END_OF_fsize.awk if test 51 -ne `wc -c read.me <<'END_OF_read.me' XThese scripts produce a historgram of the file sizes on your system. XThe sizes.sh script takes an arg as the dir to start with (the top of Xtthe tree). The current dir is used if no arg is given. X XIdeally as root, beig in the dir containg the scripts, you would say: X X# sizes.sh / >size.out Xor if your are in the root just X# sizes.sh >sizes.out X Xsizes.sh a 1 line shell script that pipes an ls -lR / into the other X scripts, result is written to std out so you need to provide X a bucket for output. Should work with csh, ksh, or sh. X If you would send the resulting output file X to jrg@galloway.sj.ca.us (or fernwood!galloway!jrg) I would X greatly appreciate it. Xfsize.awk an awk script that strips out the file size parameter from the X ls -l listing Xcount.awk an awk script that counts the like entries in a sorted list Xconsolidate.awk an awk script that groups the result of the above into 4 KB X chunks. X END_OF_read.me if test 916 -ne `wc -c sizes.sh <<'END_OF_sizes.sh' Xls -lR $1 | awk -f ./fsize.awk | sort -n | awk -f ./count.awk | awk -f ./consolidate.awk END_OF_sizes.sh if test 90 -ne `wc -c