Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!bbn.com!drilex!dricejb From: dricejb@drilex.UUCP (Craig Jackson drilex1) Newsgroups: comp.unix.wizards Subject: Re: small files and big directories (was hard links to symlinks) Message-ID: <11372@drilex.UUCP> Date: 25 May 90 19:20:26 GMT References: <874@nlsun1.oracle.nl> <1990May9.171340.5351@ucselx.sdsu.edu> <1709@cirrusl.UUCP> <24523@mimsy.umd.edu> Organization: DRI/McGraw-Hill, Lexington, MA Lines: 53 This discussion has been about why Unix tends to have small directories, and relatively deep trees, vs other systems like VMS which will have large directories and shallow trees. Chris Torek pointed out that his home directory was quite small. This is really to be expected--humans have trouble with large directories. He also said that his applications directories also tend to be small, but the inefficiency of large directories on Unix was one of his reasons. I would assert that the inefficiency of large directories on Unix is equally as limiting as the awkwardness of deep directory trees in VMS is. My best anecdote on this comes from BIX. BIX is implemented using the CoSy software from the University of Guelph. CoSy was first implemented in the early '80s, under version 7. It had a wide user community at Guelph, where it ran on Amdahl's UTS, which was then a Version 7 implementation. The editors at Byte liked it, so they bought an Arete box (nice character I/O performance) to run it. This ran SVR2 (no hissing, please). CoSy was installed, and they began testing BIX internally. Everything basically worked fine. Then they opened the system up to the public for beta-test. Lots of people signed up. Soon, they found their first scale-up problem: CoSy kept its per-user information as one directory per user. Each of these directories had the name of the user's login name, and lived in the users/ directory. Well, System V at the time didn't allow more than 1000 links to an inode. BIX quickly went over 1000 users, and all of those '..' links killed it. So, some midnight (literally) programming was done, and the next day joeuser's per-user files were in users/j/joeuser, rather than users/joeuser. That got them going again, but they still had the wall at 26,000 users to worry about. I lost touch with the details after this, but I'm pretty sure that they've gone over the 26,000 user limit since. I think today, they've bagged using the Unix file system completely--the per-user data is now in some sort of database. This effect also shows up in things like SysV's terminfo database, where you also get somedirectory/a/adm-3 kinds of things. My point on this is that some implementations probably should do something about the large-directory problem. If your application works most naturally with 2000 subdirectories, or 20,000 subdirectories, in a single directory, you shouldn't have to recode to get around system inefficiencies. Now, maybe the random workstation doesn't need this capability. But for the future, at least some implementation of Unix will need to do large directories well. -- Craig Jackson dricejb@drilex.dri.mgh.com {bbn,axiom,redsox,atexnet,ka3ovk}!drilex!{dricej,dricejb}