Path: utzoo!utgpu!news-server.csri.toronto.edu!clyde.concordia.ca!uunet!wuarchive!zaphod.mps.ohio-state.edu!uakari.primate.wisc.edu!ark1!nems!mimsy!chris
From: chris@mimsy.umd.edu (Chris Torek)
Newsgroups: comp.unix.wizards
Subject: Re: seeking information about file system details.
Message-ID: <26001@mimsy.umd.edu>
Date: 13 Aug 90 00:41:23 GMT
References: <28595@athertn.Atherton.COM>
Organization: U of Maryland, Dept. of Computer Science, Coll. Pk., MD 20742
Lines: 238

In article <28595@athertn.Atherton.COM> mcgregor@hemlock.Atherton.COM
(Scott McGregor) writes:
>I am curious as to how accesses to multiple concurrent types of file
>systems are implemented under SysV and BSD. ... I have heard of vnodes
>(which is what I think I want to know more about) but I haven't found
>anything written on vnodes.

Very little *is* written about them.

The first thing to know is that everyone does it differently.

SysV (through R3 at least) uses the File System Switch.  Each `file'
(represented as an inode? I have never seen the code) has a file system
ID attached, and instead of doing something like

	error = inode_read(struct inode *ip, parameters);

one does something like

	error = (*fssw[ip->i_fstype].fs_read)(ip, parameters);

Ultrix: uses gnodes.  Gnodes are like vnodes or inodes, only different.
(Never having dealt with them I cannot say how so, other than in name.)

SunOS: uses vnodes.  Vnodes are different in just about every release
of SunOS (they keep getting bigger).  The important part is that vnodes
have a pointer to an operations vector, so instead of

	error = vnode_read(struct vnode *vp, parameters);

one writes

	error = (*vp->v_ops->vn_read)(vp, parameters);

These are encapulated as macros to save typing:

	error = VOP_READ(vp, parameters);

4.3BSD-Reno:  uses vnodes that are very similar to SunOS vnodes, but
not identical.  Again vnodes contain pointers to operation vectors, and
again one uses macros to invoke them.  In this case, however,
operations are permitted to store state; callers must call VOP_ABORTOP
when `backing out' of something.  In other words, one does something
like

	- Look up the following name with intent to delete.
	- Oops, never mind, something went wrong; I will not delete the
	  file after all.

Or

	- Look up the following name with intent to delete.
	- Delete the name you found.

(SunOS does not have the `abort' step, thus every operation must be self-
contained.)

>I guess what I am interested in is if I have a non-unix file system
>and I want to allow the this file system to be mounted as a unix
>file system, and accessed using open, creat, read, write, close, et al,
>what interface translators would I have to create between my own
>file system and the unix system calls, and how and where would I
>add typically add these translators.

For 4.3BSD-Reno, you would:

 1. modify <sys/mount.h> to add the parameters needed for a new mount,
    along with a mount type:
	#define	MOUNT_FOOFS	5
	#define	MOUNT_MAXTYPE	5	/* nb: not 6 (go argue with kirk...) */
	struct foofs_args {
		... whatever ...
	};
 2. write VFS operations to implement the following.  `mp' is always a
    `struct mount *', and is used to name the particular foofs file
    system (except during mount, where you have to figure out which
    foofs file system is meant from the args, and store it for later
    operations).

    foofs_mount(mp, char *path, caddr_t args, struct nameidata *ndp)
	mount the foofs file system at `path' using the given arguments.

    foofs_start(mp, int flags)
	start operation on the given foofs file system (after mount completes).

    foofs_unmount(mp, int forcibly)
	unmount the given foofs file system, possibly even if something is
	going on on it.

    foofs_root(mp, struct vnode **vpp)
	set *vpp to the vnode that represents the root of the given foofs
	file system.

    foofs_quotactl(mp, int cmd, uid_t uid, caddr_t arg)
	implement quotas, or (more simply) return EOPNOSUPPORT.

    foofs_statfs(mp, struct statfs *sbp)
	fill in a `statfs' structure.

    foofs_sync(mp, int waitfor)
	write all data to permanent storage, optionally waiting until
	done befor returning.

    foofs_fhtovp(mp, struct fid *fidp, struct vnode **vpp)
	convert a `file identifier' (private data) to a vnode, storing
	the resulting vnode in *vpp.

    foofs_vptofh(struct vnode *vp, struct fid *fidp)
	convert a vnode (vp) to a file identifier that can later be
	used to get vp again.

    foofs_init()
	set up any private data structures at boot time (e.g., inode
	hash chains for a ufs file system).

 3. write vnode operations.  vp is always a `struct vnode *'; vpp is
    always a `struct vnode **' into which the resulting vnode is
    stored; ndp is always a `struct nameidata *'; vap is always a
    `struct vattr *' containing the vnode attributes to apply or
    to be filled in; and cred is always a `struct ucred *' containing
    the (Unix-level) credentials of the person or program doing the
    operation (uid and gids).  (Often the credentials are in the
    nameidata instead.)

    foofs_lookup(vp, ndp)
	look up a path name segment (it contains no slashes).  ndp
	has all the semantics embedded, except that the lookup
	takes place in the directory given by vp.

    foofs_create(ndp, vap)
	create a file (after a lookup with CREATE set); store the new
	vnode in ndp->ni_vp.

    foofs_mknod(ndp, vap, cred)
	create a `node' (for Unix-specific thing like devices).

    foofs_open(vp, int flags, cred)
    	open a file; flags is from the open() syscall, e.g., no-delay.

    foofs_close(vp, int flags, cred)
	close a file.

    foofs_access(vp, int flags, cred)
	test to see if the given user (cred) has the given access mode
	(flags&VREAD, flags&VWRITE, flags&VEXEC).

    foofs_getattr(vp, vap, cred)
	get file attributes (store in *vap).

    foofs_setattr(vp, vap, cred)
	set file attributes (do not change those marked VNOVAL).

    foofs_read(vp, struct uio *uio, int ioflags, cred)
    	read from file; valid flags are IO_NDELAY: do not block.

    foofs_write(vp, struct uio *uio, int ioflags, cred)
	write to file; valid flags are IO_UNIT: write as atomic unit
	(if write fails, undo it); IO_APPEND: append to file; IO_SYNC:
	do synchronous writes; IO_NDELAY: do not block.

    foofs_ioctl(vp, int cmd, caddr_t data, int flags, cred)
	do ioctl operation.

    foofs_select(vp, int which, int flag, cred)
	select for read/write/exception (which==FREAD/FWRITE/0 resp).

    foofs_mmap(vp, not-defined-yet)
	reserved for 4.4BSD.  probably should be replaced with name
	and unname page operations a la SunOS.

    foofs_fsync(vp, int flags, cred, int waitfor)
	push all blocks for the file to permanent storage.  fflags
	is the file table entry flags.  optionally, wait until done.

    foofs_seek(vp, oldoff, newoff, cred)
	I dunno what this is doing in the ops vector.  Seeks are all
	taken care of above the vnode layer.

    foofs_remove(ndp)
	remove a file (after a lookup with DELETE).

    foofs_link(vp, ndp)
	create a link to the file vp (after a lookup with CREATE).

    foofs_rename(struct nameidata *old, struct nameidata *new)
	rename the old name to the new one (after a lookup with RENAME).

    foofs_mkdir(ndp, vap)
	create a directory (after a lookup with CREATE).

    foofs_rmdir(ndp)
	remove a directory (after a lookup with REMOVE).

    foofs_symlink(ndp, vap, char *target)
	create a symbolic link (after a lookup with CREATE).

    foofs_readdir(vp, struct uio *uio, cred, int *eofflag)
	read directory contents (store in `struct dirent' format).
	eofflag is not used for anything.

    foofs_readlink(vp, struct uio *uio, cred)
	read symlink contents.

    foofs_abortop(ndp)
	changed mind after operation with intent to create/remove/rename.

    foofs_inactive(vp)
	last close on file, do whatever is appropriate (e.g., write
	inode back).

    foofs_reclaim(vp)
	vnode vp being reused; disassociate from any cache.

    foofs_lock(vp)
	lock underying object (if possible).

    foofs_unlock(vp)
	unlock underlying object.  (These are not `file locking' locks;
	POSIX locking is not yet implemented.)

    foofs_bmap(vp, daddr_t bn, vpp, daddr_t *mapbn)
	map logical block number to physical block number, for old VM
	code.  this should go away.

    foofs_strategy(struct buf *bp)
	map logical block to physical and do I/O.  (Probably also should
	go away; read/write should handle this.  I do not trust the
	current buffer cache code....)

    foofs_print(vp)
	print contents of vnode, for debugging.

    foofs_islocked(vp)
	return true if underlying object is locked.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris
	(New campus phone system, active sometime soon: +1 301 405 2750)