Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!rutgers!apple!voder!nsc!taux01!amos From: amos@taux01.UUCP (Amos Shapir) Newsgroups: comp.unix.wizards Subject: Re: [ted%NMSU.Edu: ] Message-ID: <1852@taux01.UUCP> Date: 13 Jun 89 12:13:31 GMT References: <19976@adm.BRL.MIL> Organization: National Semiconductor (IC) Ltd, Israel Home of the 32532 Lines: 111 Hdate: 10 Sivan 5749 In article <19976@adm.BRL.MIL> ted@nmsu.edu writes: >i realize it is entirely out of character to provide in a >unix-wizards discussion, but here is the result of just such a survey >made on an active research machine with about 3GB of disk space in >use. these results were obtained by doing > > find / -print |sed -e 's/.*\///' |chars-in-line |sort -n |uniq -c > >where chars-in-line is a tiny program to count the characters on each >input line. the work was done as super-user to avoid directory read >problems. There's no need to use four tools (including a custom-built one) where one will do. The following awk program collects the same statistics; put it in count.awk and run as 'find / -print | awk -f count.awk '. BEGIN {FS = "/"} { l = length($NF) c[l]++ if(l>max) max=l } END { for(i=1; i<=max; i++) { s += c[i] printf("%2d %5d %5.2f%% %5d %6.2f%%\n", i, c[i], c[i]/NR*100, s, s/NR*100) } } I ran it on our system (Sequent Balance running Dynix) which also has some 3Gb of disk space. Columns are: name length, no. of files of this length, percentage of total, and a cummulative count of the last two. 1 1126 0.38% 1126 0.38% 2 2751 0.93% 3877 1.32% 3 13531 4.59% 17408 5.91% 4 26337 8.94% 43745 14.85% 5 15983 5.43% 59728 20.28% 6 25357 8.61% 85085 28.89% 7 28932 9.82% 114017 38.72% 8 44208 15.01% 158225 53.73% 9 30821 10.47% 189046 64.19% 10 30394 10.32% 219440 74.51% 11 23427 7.95% 242867 82.47% 12 21944 7.45% 264811 89.92% 13 13172 4.47% 277983 94.39% 14 8939 3.04% 286922 97.43% 15 2342 0.80% 289264 98.22% 16 1639 0.56% 290903 98.78% 17 820 0.28% 291723 99.06% 18 580 0.20% 292303 99.25% 19 371 0.13% 292674 99.38% 20 223 0.08% 292897 99.46% 21 190 0.06% 293087 99.52% 22 120 0.04% 293207 99.56% 23 165 0.06% 293372 99.62% 24 133 0.05% 293505 99.66% 25 49 0.02% 293554 99.68% 26 21 0.01% 293575 99.69% 27 14 0.00% 293589 99.69% 28 18 0.01% 293607 99.70% 29 26 0.01% 293633 99.71% 30 8 0.00% 293641 99.71% 31 14 0.00% 293655 99.71% 32 26 0.01% 293681 99.72% 33 5 0.00% 293686 99.72% 34 6 0.00% 293692 99.73% 35 22 0.01% 293714 99.73% 36 85 0.03% 293799 99.76% 37 85 0.03% 293884 99.79% 38 116 0.04% 294000 99.83% 39 135 0.05% 294135 99.88% 40 65 0.02% 294200 99.90% 41 82 0.03% 294282 99.93% 42 84 0.03% 294366 99.95% 43 48 0.02% 294414 99.97% 44 20 0.01% 294434 99.98% 45 25 0.01% 294459 99.99% 46 7 0.00% 294466 99.99% 47 18 0.01% 294484 99.99% 48 9 0.00% 294493 100.00% 49 2 0.00% 294495 100.00% 50 0 0.00% 294495 100.00% 51 0 0.00% 294495 100.00% 52 0 0.00% 294495 100.00% 53 0 0.00% 294495 100.00% 54 1 0.00% 294496 100.00% 55 0 0.00% 294496 100.00% 56 0 0.00% 294496 100.00% 57 0 0.00% 294496 100.00% 58 1 0.00% 294497 100.00% 59 0 0.00% 294497 100.00% 60 0 0.00% 294497 100.00% 61 0 0.00% 294497 100.00% 62 0 0.00% 294497 100.00% 63 0 0.00% 294497 100.00% 64 1 0.00% 294498 100.00% >the number of very long names is rather surprising (at least to me), >but there is a good indication 255 = infinity as far as file names >are concerned. > >hope this helps somebody. Ditto. -- Amos Shapir amos@nsc.com National Semiconductor (Israel) P.O.B. 3007, Herzlia 46104, Israel Tel. +972 52 522261 TWX: 33691, fax: +972-52-558322 34 48 E / 32 10 N (My other cpu is a NS32532)