Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!decvax!decwrl!amdcad!lll-crg!seismo!ut-sally!std-unix From: std-unix@ut-sally.UUCP Newsgroups: mod.std.unix Subject: Re: Case sensitive file names Message-ID: <6029@ut-sally.UUCP> Date: Fri, 17-Oct-86 12:35:48 EDT Article-I.D.: ut-sally.6029 Posted: Fri Oct 17 12:35:48 1986 Date-Received: Fri, 17-Oct-86 21:20:24 EDT References: <6002@ut-sally.UUCP> <5865@ut-sally.UUCP> <6018@ut-sally.UUCP> Organization: IEEE P1003 Portable Operating System for Computer Environments Committee Lines: 73 Approved: jsq@sally.utexas.edu From: cbosgd!cbosgd.ATT.COM!mark@ucbvax.berkeley.edu (Mark Horton) Date: Fri, 17 Oct 86 11:20:32 edt Organization: AT&T Medical Information Systems, Columbus Don Provan raises some interesting questions about foreign languages. In general, I think we know how to do a case insensitive comparison appropriately, by extending a function (I think it's called strcoll, but I don't have my X3J11 draft handy) defined in ANSI C; the function is like strcpy, but the destination buffer gets a translation of the string that will collate properly when a lexicographic comparison like strcmp is used. If we extend this function to also translate to one case (as appropriate) and allow each country to define its own function, it's technically possible to ignore case. Whether it's fast enough for the UNIX filesystem is unclear, although this problem is not restricted to UNIX. I think it would be interesting to hear what other, case-insensitive operating systems do about these issues. What do MS DOS, or VM/CMS, or VMS, or whatever, do with their case insensitive file names in Europe, or Japan, or whereever? If the answer is that file names are restricted to use the same character set as in the USA, and that extra letters are disallowed, then we need to know how well this is accepted by the users on other systems. Maybe it's good enough. Do users in other countries often create files whose names contain extra letters? If they try, does the shell get in the way if their letter happens to be "|", for example? If the answer is that other operating systems have forced other countries to put up with Americanisms, and that POSIX is an opportunity to break new ground by handling other languages properly, then by all means let's do it right. This might require 8 bit characters in file names, for example. Incidently, I've seen it claimed here that UNIX allows arbitrary byte streams in file names. Perhaps this is the intent, but in reality the UNIX filesystem is far from a transparent path. There are lots of restrictions, some of which are: The slash character is special. The null character is special. Sequences of more than 14 chars not containing a slash are either illegal or only significant to 14 chars or significant to 256 chars, depending on the version of UNIX. Characters with the 8th bit turned on are not allowed. Since many commands take names beginning with "-" as flags, file names beginning with "-" don't always work. Since the shell treats many of the punctuation characters specially, file names containing space, #, $, &, *, (, ), [, ], ;, ', ", \, |, <, >. and ? do not always work properly. Even if you quote them, the shell strips off the quotes, so that if multiple layers of shell are involved (for example, uux) it still fails. Because some of these problems only affect certain uses of the filesystem (whether or not you go through the shell, whether or not you're going through a command that takes arguments) it's not unusual for casual users to create a file and then have trouble using, renaming, or even removing it. I recall that removing a file whose 8th bit has been set is a frequent topic on net.unix. If the filesystem were really transparent, the designers of /proc would not have had to encode process ID's in ASCII digits, they could have directly used the binary representation. It's for these reasons that I feel that a conservative UNIX user should restrict themselves to certain "reasonable" filename conventions; basically using only lower case letters, digits, and a few save punctuation characters such as . and - in their filenames. Just because it's possible to put a space in a file name doesn't make it a good idea. Mark Volume-Number: Volume 7, Number 67