Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!mcnc!duke!bet From: bet@orion.mc.duke.edu (Bennett Todd) Newsgroups: comp.unix.wizards Subject: Filename length statistics Message-ID: <14749@duke.cs.duke.edu> Date: 14 Jun 89 17:18:59 GMT References: <19976@adm.BRL.MIL> <4530@ficc.uu.net> Sender: news@duke.cs.duke.edu Reply-To: bet@orion.mc.duke.edu (Bennett Todd) Organization: Diagnostic Physics, Raddiology, DUMC Lines: 87 In-reply-to: peter@ficc.uu.net (Peter da Silva) In article <4530@ficc.uu.net>, peter@ficc (Peter da Silva) writes: >I've added a cumulative total > > 392 1 392 0.57% > ... >1533 14 65848 94.96% <-- Covers most bases. > ... > 23 30 69284 99.92% <-- Covers virtually all bases. > ... > 1 51 69340 100.00% > >14 corresponds to SysV. 30 corresponds to SysV with DIRSIZ doubled. There were >56 files, or 0.08%, that were longer than this. Out of curiosity I ran this over the 2187 files under my home directory; some of the statistics came out a little differently. Specifically, the ones shown above come out like so for me: 1 237 10.84% 237 10.84% ... 14 55 2.51% 1805 82.53% ... 30 4 0.18% 2166 99.04% ... 53 1 0.05% 2187 100.00% (I just noticed my column ordering is different; I used the awk program someone posted, which I append at the end). The 14-character long names only handle ~83% of my filenames (this includes directory names, and in particular includes "." and ".." for every directory, so there is some structural weighting acting against my statistics here). Further, the 30 character names still left nearly 1% of my choices, 21 out of 2187, chopped. Some of our users would show much higher filename length distributions, others lower. Having a shell with filename completion certainly removes much of the incentive for short, cryptic filenames. Also, I personally think that collecting statistics like this should be done over home directories, not over everything below root, since many of the filenames in the root and /usr filesystems are inherited from the original UNIX system, rather than chosen since. Further, the most useful place for really large filenames I've seen is in organizing personal archives, where you can make the name sufficiently descriptive to make it easier to find later. For completeness, here's the program I used (a shell script I wrapped around an awk program someone else posted): #!/bin/sh progname=`basename $0` awkprg=/tmp/$progname$$ trap "rm -f $awkprg;exit 1" 0 1 2 3 cat >$awkprg <<'EOF' BEGIN {FS = "/"} { l = length($NF) c[l]++ if(l>max) max=l } END { for(i=1; i<=max; i++) { s += c[i] printf("%2d %5d %5.2f%% %5d %6.2f%%\n", i, c[i], c[i]/NR*100, s, s/NR*100) } } EOF if test $# -eq 0 then set '.' fi find "$@" -print | awk -f $awkprg rm -f $awkprg trap "" 0 1 2 3 exit 0 -Bennett bet@orion.mc.duke.edu P.S. Tonight I'm going to run the same thing over everyone's home directories on our system, as well as over everything from the root down; I'll post the results tomorrow if all goes well.