Path: utzoo!utgpu!news-server.csri.toronto.edu!clyde.concordia.ca!uunet!wuarchive!zaphod.mps.ohio-state.edu!uakari.primate.wisc.edu!ark1!nems!mimsy!chris From: chris@mimsy.umd.edu (Chris Torek) Newsgroups: comp.unix.wizards Subject: Re: seeking information about file system details. Message-ID: <26001@mimsy.umd.edu> Date: 13 Aug 90 00:41:23 GMT References: <28595@athertn.Atherton.COM> Organization: U of Maryland, Dept. of Computer Science, Coll. Pk., MD 20742 Lines: 238 In article <28595@athertn.Atherton.COM> mcgregor@hemlock.Atherton.COM (Scott McGregor) writes: >I am curious as to how accesses to multiple concurrent types of file >systems are implemented under SysV and BSD. ... I have heard of vnodes >(which is what I think I want to know more about) but I haven't found >anything written on vnodes. Very little *is* written about them. The first thing to know is that everyone does it differently. SysV (through R3 at least) uses the File System Switch. Each `file' (represented as an inode? I have never seen the code) has a file system ID attached, and instead of doing something like error = inode_read(struct inode *ip, parameters); one does something like error = (*fssw[ip->i_fstype].fs_read)(ip, parameters); Ultrix: uses gnodes. Gnodes are like vnodes or inodes, only different. (Never having dealt with them I cannot say how so, other than in name.) SunOS: uses vnodes. Vnodes are different in just about every release of SunOS (they keep getting bigger). The important part is that vnodes have a pointer to an operations vector, so instead of error = vnode_read(struct vnode *vp, parameters); one writes error = (*vp->v_ops->vn_read)(vp, parameters); These are encapulated as macros to save typing: error = VOP_READ(vp, parameters); 4.3BSD-Reno: uses vnodes that are very similar to SunOS vnodes, but not identical. Again vnodes contain pointers to operation vectors, and again one uses macros to invoke them. In this case, however, operations are permitted to store state; callers must call VOP_ABORTOP when `backing out' of something. In other words, one does something like - Look up the following name with intent to delete. - Oops, never mind, something went wrong; I will not delete the file after all. Or - Look up the following name with intent to delete. - Delete the name you found. (SunOS does not have the `abort' step, thus every operation must be self- contained.) >I guess what I am interested in is if I have a non-unix file system >and I want to allow the this file system to be mounted as a unix >file system, and accessed using open, creat, read, write, close, et al, >what interface translators would I have to create between my own >file system and the unix system calls, and how and where would I >add typically add these translators. For 4.3BSD-Reno, you would: 1. modify to add the parameters needed for a new mount, along with a mount type: #define MOUNT_FOOFS 5 #define MOUNT_MAXTYPE 5 /* nb: not 6 (go argue with kirk...) */ struct foofs_args { ... whatever ... }; 2. write VFS operations to implement the following. `mp' is always a `struct mount *', and is used to name the particular foofs file system (except during mount, where you have to figure out which foofs file system is meant from the args, and store it for later operations). foofs_mount(mp, char *path, caddr_t args, struct nameidata *ndp) mount the foofs file system at `path' using the given arguments. foofs_start(mp, int flags) start operation on the given foofs file system (after mount completes). foofs_unmount(mp, int forcibly) unmount the given foofs file system, possibly even if something is going on on it. foofs_root(mp, struct vnode **vpp) set *vpp to the vnode that represents the root of the given foofs file system. foofs_quotactl(mp, int cmd, uid_t uid, caddr_t arg) implement quotas, or (more simply) return EOPNOSUPPORT. foofs_statfs(mp, struct statfs *sbp) fill in a `statfs' structure. foofs_sync(mp, int waitfor) write all data to permanent storage, optionally waiting until done befor returning. foofs_fhtovp(mp, struct fid *fidp, struct vnode **vpp) convert a `file identifier' (private data) to a vnode, storing the resulting vnode in *vpp. foofs_vptofh(struct vnode *vp, struct fid *fidp) convert a vnode (vp) to a file identifier that can later be used to get vp again. foofs_init() set up any private data structures at boot time (e.g., inode hash chains for a ufs file system). 3. write vnode operations. vp is always a `struct vnode *'; vpp is always a `struct vnode **' into which the resulting vnode is stored; ndp is always a `struct nameidata *'; vap is always a `struct vattr *' containing the vnode attributes to apply or to be filled in; and cred is always a `struct ucred *' containing the (Unix-level) credentials of the person or program doing the operation (uid and gids). (Often the credentials are in the nameidata instead.) foofs_lookup(vp, ndp) look up a path name segment (it contains no slashes). ndp has all the semantics embedded, except that the lookup takes place in the directory given by vp. foofs_create(ndp, vap) create a file (after a lookup with CREATE set); store the new vnode in ndp->ni_vp. foofs_mknod(ndp, vap, cred) create a `node' (for Unix-specific thing like devices). foofs_open(vp, int flags, cred) open a file; flags is from the open() syscall, e.g., no-delay. foofs_close(vp, int flags, cred) close a file. foofs_access(vp, int flags, cred) test to see if the given user (cred) has the given access mode (flags&VREAD, flags&VWRITE, flags&VEXEC). foofs_getattr(vp, vap, cred) get file attributes (store in *vap). foofs_setattr(vp, vap, cred) set file attributes (do not change those marked VNOVAL). foofs_read(vp, struct uio *uio, int ioflags, cred) read from file; valid flags are IO_NDELAY: do not block. foofs_write(vp, struct uio *uio, int ioflags, cred) write to file; valid flags are IO_UNIT: write as atomic unit (if write fails, undo it); IO_APPEND: append to file; IO_SYNC: do synchronous writes; IO_NDELAY: do not block. foofs_ioctl(vp, int cmd, caddr_t data, int flags, cred) do ioctl operation. foofs_select(vp, int which, int flag, cred) select for read/write/exception (which==FREAD/FWRITE/0 resp). foofs_mmap(vp, not-defined-yet) reserved for 4.4BSD. probably should be replaced with name and unname page operations a la SunOS. foofs_fsync(vp, int flags, cred, int waitfor) push all blocks for the file to permanent storage. fflags is the file table entry flags. optionally, wait until done. foofs_seek(vp, oldoff, newoff, cred) I dunno what this is doing in the ops vector. Seeks are all taken care of above the vnode layer. foofs_remove(ndp) remove a file (after a lookup with DELETE). foofs_link(vp, ndp) create a link to the file vp (after a lookup with CREATE). foofs_rename(struct nameidata *old, struct nameidata *new) rename the old name to the new one (after a lookup with RENAME). foofs_mkdir(ndp, vap) create a directory (after a lookup with CREATE). foofs_rmdir(ndp) remove a directory (after a lookup with REMOVE). foofs_symlink(ndp, vap, char *target) create a symbolic link (after a lookup with CREATE). foofs_readdir(vp, struct uio *uio, cred, int *eofflag) read directory contents (store in `struct dirent' format). eofflag is not used for anything. foofs_readlink(vp, struct uio *uio, cred) read symlink contents. foofs_abortop(ndp) changed mind after operation with intent to create/remove/rename. foofs_inactive(vp) last close on file, do whatever is appropriate (e.g., write inode back). foofs_reclaim(vp) vnode vp being reused; disassociate from any cache. foofs_lock(vp) lock underying object (if possible). foofs_unlock(vp) unlock underlying object. (These are not `file locking' locks; POSIX locking is not yet implemented.) foofs_bmap(vp, daddr_t bn, vpp, daddr_t *mapbn) map logical block number to physical block number, for old VM code. this should go away. foofs_strategy(struct buf *bp) map logical block to physical and do I/O. (Probably also should go away; read/write should handle this. I do not trust the current buffer cache code....) foofs_print(vp) print contents of vnode, for debugging. foofs_islocked(vp) return true if underlying object is locked. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@cs.umd.edu Path: uunet!mimsy!chris (New campus phone system, active sometime soon: +1 301 405 2750)