Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!linus!philabs!cmcl2!harvard!caip!lll-crg!lll-lcc!ucdavis!ucbvax!jade!ucbopal!mwm
From: mwm@ucbopal.berkeley.edu (Mike (I'll be mellow when I'm dead) Meyer)
Newsgroups: net.micro.amiga
Subject: Re: AmigaDos...
Message-ID: <670@jade.BERKELEY.EDU>
Date: Wed, 7-May-86 07:18:45 EDT
Article-I.D.: jade.670
Posted: Wed May  7 07:18:45 1986
Date-Received: Sun, 11-May-86 04:49:07 EDT
References: <1076@h-sc1.UUCP> <615@jade.BERKELEY.EDU> <1084@h-sc1.UUCP>
Sender: usenet@jade.BERKELEY.EDU
Reply-To: mwm@ucbopal.UUCP (Mike (I'll be mellow when I'm dead) Meyer)
Distribution: net
Organization: Missionaria Phonibalonica
Lines: 140

In article <1084@h-sc1.UUCP> breuel@h-sc1.UUCP (thomas breuel) writes:
>It can take quite some time to list all the files in a directory,
>into the 60 sec range for some of the disks I have.

You must have some LARGE directories. The worst case I can find is fish
disk #13, with 106 files and 6 directories, which takes less than 32
seconds to list. My c: directory (with 49 files) takes less than 12
seconds to list. Assuming that things are linear, I've gotta conclude
that you've got directories with over 200 files in them. Even my junk
directory on Unix isn't that bad (yet).

>Seeking through the two or three linearly linked allocation blocks
>on disk can be excrutiatingly slow, however, if you have to do
>it for every seek, and if they are scattered. This seems to be the
>case for several of my disks. I can assure you that in several cases
>I had a strong dependence of seek time on position in the file, i.e.
>very fast seeks for the beginning, and long delays for seeks near the
>end of the file.

Hmmm. I definitely saw no such correspondence. Seeks after the first
took ~30 ticks no matter where in the 700K+ file they were, the first
seek taking between 1.5 and 2.5 times that long, again seemingly
position independent. I started with a fresh disk, so that there should
be little or no fragmentation of the disk. If you didn't do that, you
might try copying your data base to a freshly formatted disk, and seeing
how long it takes.

>Some arithmetic (correct me if I'm wrong): a file header block
>or a file list block contains 72 block pointers, which represent
>36k of data in a file. For a 720k file, the allocation table
>can therefore grow to 20 blocks. For a search through a linear
>chain of blocks, this is not insignificant anymore.

You're right - I goofed. For some reason, I was thinking of 1K blocks,
so each allocation block gets you 200 pointers for about 200K. Sorry.

>|The comment
>|about doing globbing is true; I still can't figure out why they used
>|globbing so heavily in the WorkBench.
>
>It's because people like me forget their filenames so often :-).
>A very large percentage of my CLI commands are dir's (on UN*X,
>ls's). Likewise, most people probably prefer to be shown a dialog
>box listing all available files, ready for click selection, to
>a string dialog box.

No, that's why *YOU* use globbing so heavily, not why the workbench does
(opening *.info when you open a window). I don't run dir's and/or ls's
all that often - but I use file completion a lot :-).

>I was not suggesting a file system *identical* to UN*X. But there are
>other alternatives. Again, I personally favour the idea of tree-like
>allocation tables (because it makes truely random accesses affordable).
>Also, it can actually help save space greatly if it allows for small
>'seedling' files (just one allocated block) like ProDos.

Tree-like allocation tables can be a good thing. Allowing both shouldn't
be hard; it's got to be easier than the blocks/frags stuff that 4BSD does.
Dynamically switching from a "linked list" of two to a log-tree when you
hit three would probably be a good thing, if you're going to do both
anyway (anyone at C/A listening?).

>I believe that the loss of efficiency in 'open' by putting names back
>into directories is tolerable.

Putting names back in directories would cut the number of file entries per
directory block by a factor of 8. It'd also require a redesign of the file
system from the ground up, as you've gotta throw away the hashing and replace
it by something else - say a linear search of the blocks in the directory,
one block per 9 directory entries (roughly).

>Furthermore, for large directories,
>the name/inode-pair directory wins over the hash/bucket-list directory
>again. If done carefully, the only thing that is lost in terms of
>recoverability is that you may not be able to recover a file's name.
>Otherwise, such a file system can be as robust as the current Amiga file
>system.

Hmmm - I'm beginning to suspect that we're using the same term for different
things. Let me take time out to briefly describe the Unix tree structure, so
that I know we're using the same terminology.

The heart of the Unix directory tree is the ilist. This is a set of blocks
at fixed locations on the file system, usually at the front (front of each
cylinder group for 4BSD). Each inode in the ilist contains *all* information
pertinent to a file - owner, size, status bits, pointers to blocks of data.
A directory is a list of names and inode numbers, the inode number being
the index into the ilist.

Given this, it's obvious that an open looks like a loop that does:

	"search directory for name, and get inode number associated with name"
	"Go get block with that inode in it, so we'll have a list of blocks
		for the next directory"

From this, it's obvious that directories with name/inode pairs in them
take 2 disk accesses per element in the path. AmigaDOS takes 1 disk
access per element in the path (both cases assuming that directories
are "small", and ignoring the the overhead to get the file once you've
find it). Putting names back in the directory will just make the definition
of "small" directories much smaller.

On the other hand, going to name/inode pairs in the directory gets you back
to 2 disk accesses per path element. Worse yet, you either make the ilist a
linked list (in which case, you'll learn what SLOW really means), or it lives
in a known location on disk, meaning that a lost ilist block is a catastrophe.
You could probably make the name/inode organization as robust as the AmigaDOS
organization by following the same file header/block header structure. BTW,
you can loose file names if you loose the file header in AmigaDOS. I think
that's the worst case, though.

Finally, the contention that large directories win with inode/name
organizations - I disagree. Here's where the AmigaDOS caching finally comes
in. Let's assume that a "small" directory requires one block on Unix,
and has no collisions on AmigaDOS. For convenience, assume that "small"
directories are the same size on both systems, and have N file entries
(N for AmigaDOS is 72 (right?); for Unix, it's 32 for v7oid systems,
a minimax of 42 on 4BSD, and probably one of 32/64/128 on sysV). Then access
to a file in a small directory is the same on both systems - 1 disk access.
For a directory that's got M*N entries, it's M/2 (average) on Unix to search
the directory blocks - *not counting the overhead to search the allocation
table*; and M/2 (average) on AmigaDOS to search the hash chain for that entry
 - NO EXTRA OVERHEAD. So AmigaDOS wins by an (over)head :-).

Of course, for listing the files in the directory, AmigaDOS still has
one block/file, plus the directory. Unix keeps getting N files names/block
(again ignoring overhead).

File system design - like the rest of OS design - is non-trivial, and
correct solutions for one application/environment aren't correct - or
even usuable - for another application/environment. The Unix OS on an
Amiga, implemented roughly like the AmigaDOS file system is implemented
(no caching!), would be a total disaster for almost anything. The
AmigaDOS file system isn't perfect for the Amiga environment, and some of
the Unix ideas would be an improvement (the log-tree allocation table and
soft links ('aliases' under AOS, I think) just to start). But I have to
agree with Matt Dillon that what the AmigaDOS file system needs most is
disk caching.

	<mike