Path: utzoo!mnetor!uunet!seismo!sundc!pitstop!sun!gorodish!guy From: guy@gorodish.Sun.COM (Guy Harris) Newsgroups: comp.unix.wizards Subject: Re: FileNames with the high bit set. Message-ID: <48993@sun.uucp> Date: 11 Apr 88 00:38:31 GMT References: <8120010@eecs.nwu.edu> Sender: news@sun.uucp Lines: 46 > On our 4.3+NFS (Mt. Xinu) system on a Vax780 and also on a Sun 3/60 > running SunOS 3.5, open(2) and creat(2) return EINVAL if the pathname > supplied to them has a character with the high order bit set. > > Why is this ? Has this behaviour been added by Berkeley Unix or has > it "always" been there in Unix ? It was added in 4.2BSD. > Is it because sh(1) uses the parity bit for it's own purposes and the > kernel does not want to create files that the shell might not be able > to handle in this manner ? In addition to pre-S5R3 "sh", the C shell also uses the parity bit for this. The 8th bit stuff was probably thrown in for precisely the reason you list. > In any case, this seems like an arbitrary restriction. It is. > I can imagine applications which might want to create files that have > names with arbitrary bytes in them (if you used a hashing function > on some key to come up with a filename, you can get an "invalid" > pathname). Hell, I have a symbolic link to "/vmunix" on my machine named "/UNIX(r)", where "(r)" refers to the ISO Latin #1 "registered trademark" character, which has the hexadecimal code 0xAE. SunOS 4.0 removed the restriction in question; it uses the S5R3 Bourne shell as its Bourne shell, and that shell doesn't have problems with file names containing 8-bit characters, so if you have files like that lying around "rm -i *" (or "rm -i .*" if the file name begins with ".") can clean them up from the Bourne shell. The 4.0 C shell still can't handle filenames such as that; this is a restriction we currently plan to lift in a future release. Creating file names containing arbitrary character codes is probably not a good idea; if you have an OS and file system that allow you to create very long file names, you should use that capability. The reason we removed the restriction was not so that you could create files with binary names; it was as a first step towards supporting larger character sets than ASCII, such as the ISO 8859 chraracter sets and the various EUC-derived Asian character sets, in file names. (BTW, you *can't* create files that have names with truly arbitrary bytes in them; '/' and '\0' are not valid in UNIX file names - '/' separates *file* names in a *path* name, and '\0' terminates a path name.)