Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.1 6/24/83; site mit-eddie.UUCP Path: utzoo!watmath!clyde!burl!ulysses!allegra!mit-eddie!wesommer From: wesommer@mit-eddie.UUCP (Bill Sommerfeld) Newsgroups: net.unix-wizards Subject: Re: rwhod, creat slowness Message-ID: <173@mit-eddie.UUCP> Date: Thu, 13-Feb-86 17:33:09 EST Article-I.D.: mit-eddi.173 Posted: Thu Feb 13 17:33:09 1986 Date-Received: Sat, 15-Feb-86 02:11:02 EST References: <789@brl-smoke.ARPA> Reply-To: wesommer@mit-eddie.UUCP (Bill Sommerfeld) Organization: MIT, Cambridge, MA Lines: 54 The rwho daemon is well known for creating n^2 scaling problems on large nets. It uses up a lot of CPU keeping effectively identical copies of the files in a directory on all systems on the local net. In article <789@brl-smoke.ARPA> speck@vlsi.caltech.edu (Don Speck) writes: > > About a month ago I discovered that 80% of all disk I/O >done on our Suns was the single, simple line (in rwhod.c): > > whod = creat(path, 0666); > >where path = "/usr/spool/rwho/rwhod.%s" (%s = hostname). > > How could this innocent-looking line be such a hog? > >1) Each machine executed it 18 times per minute (we have > 18 rwhod's running on one net) There is a simple solution to this, which requires a small fix to rwhod. Modify it so that it accepts a -n option (for "no write to disk"). You then modify the loop such that it doesn't do that creat() and write() when the -n option is set. >2) All those directories had to be looked up each time >3) On Suns, /usr/spool is a symlink to /private/usr/spool, > adding another 3 directories to be looked up >4) On Suns, /usr and /usr/spool sit on a Network FileSystem. > Sun's NFS has no caching in the clients; each lookup > requires a server transaction over the network >5) 14 Suns used the /usr network filesystem > You can then set things up so that /usr/spool/rwho on all machines points to the /usr/spool/rwho of the server, and modify all but the server to run rwhod -n in /etc/rc. The only time that remote I/O is needed is when someone does an rwho or ruptime to find out what's going on. >Why is creat(), probably one of the top 10 system calls, so >slow on 4.2bsd systems? Why is ftruncate just as slow - and >still takes 30ms even if the file is already the correct size? >Apparently these system calls do *synchronous* I/O, ignoring >the buffer cache (even on plain VAX 4.2bsd, without any NFS >clouding the issue). > They do synchronous I/O so that the filesystem is not corrupted in uncontrolled ways when a system crashes. This simplifies fsck's job. Bill Sommerfeld MIT Project Athena wesommer@athena.mit.edu mit-eddie!mit-athena!wesommer