Path: utzoo!mnetor!uunet!husc6!cmcl2!nrl-cmf!ames!oliveb!sun!gorodish!guy From: guy@gorodish.Sun.COM (Guy Harris) Newsgroups: comp.unix.wizards Subject: Re: Kernel Hacks & Weird Filenames Message-ID: <52222@sun.uucp> Date: 6 May 88 00:02:38 GMT References: <13041@brl-adm.ARPA> <2630003@hpsal2.HP.COM> Sender: news@sun.uucp Lines: 69 > HP-UX has the routine isprint (most likely all other Un*xes have it too). One should hope so; HP certainly didn't invent it, the people at BTL Research did. > So it is not too hard to determine what a printable character is (HP-UX's > implementation includes NLS as well). Given that it includes NLS, there is no single answer to that question. The answer depends on the character set you select. This brings up another question: should the answer depend on the type of terminal you're currently logged in on? I.e., if you're on a VT100, should the upper half of ISO Latin #1 be excluded, while if you're on a VT220 it's included? Another question: what does "isprint" do about "wide" character sets such as various Kanji character sets? > As to the whole topic of what belongs in a valid filename, it seems to me > that if you could truly have ANY character in a filename, then things would > be ok, but that isn't the case. First of all, as others have pointed out, > you have to exclude '\0' and '/'. In addition, most (all?) shells use some > characters as metacharacters. Most, not all. The major conventional UNIX command-line shells do; however, you could have a "fill in the form" shell, or a "desktop metaphor" shell, that doesn't. > In short, I see no gain and many drawbacks to allowing arbitrary characters > in filenames. OK, what does "allowing" mean here? There *might* be some merit to disallowing the creation of path names containing certain bytes (note, as per the prvious mention of Kanji character sets, that a "character" is not necessarily a single byte). Disallowing *all* pathnames containing these bytes would be wrong, however, as it would prohibit you from referring to some of those files if your session weren't configured to allow all characters in file names. (No, you can't say "you're on a terminal that doesn't support 8-bit characters, you wouldn't be able to refer to them anyway"; consider a user logged in on a 7-bit terminal doing an "rm -rf" on a directory containing files with 8-bit characters in their names - or just with blanks in their names, if you choose to disallow them.) And, once again, I bring up the question of character sets such as various Kanji sets. If not all 16-bit combinations are valid Kanji, how can you disallow "invalid" characters if each of the two bytes in such a character is valid in some other character? Sure, it sounds nice to say "make life easier for the user by preventing hard-to-reference filenames from being used". It's not clear that it's really that easy. Obviously, the kernel should not provide any policy here; I'm not sure you can even provide a reasonable policy-free mechanism atop which the desired policies can be implemented. BTW, note that Draft 12 of POSIX says: filename Names consisting of 1 to {NAME_MAX} bytes may be used to name a file. The characters composing the name may be selected from the set of all characters excluding the slash character and those containing the null byte (octal zero). From this, I infer that no POSIX-conformant system will prohibit me from using ^A or '\353' in a file name; there may well be application writers who, for whatever reason (bad or good), decide to do so. Turning on filename restrictions might conceivably break these applications; before you add such restrictions, make sure either that this won't break any important applications or that you can live with them not working.