Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!watmath!clyde!rutgers!seismo!ut-sally!std-unix From: std-unix@ut-sally.UUCP (Moderator, John Quarterman) Newsgroups: mod.std.unix Subject: Re: Case sensitive file names Message-ID: <6094@ut-sally.UUCP> Date: Sat, 25-Oct-86 21:51:48 EST Article-I.D.: ut-sally.6094 Posted: Sat Oct 25 21:51:48 1986 Date-Received: Sun, 26-Oct-86 04:13:02 EST Organization: IEEE P1003 Portable Operating System for Computer Environments Committee Lines: 171 Approved: jsq@sally.utexas.edu From: guy@sun.com (Guy Harris) Date: Mon, 20 Oct 86 10:49:33 PDT Responses to a couple of messages: >From Mark Horton: > Any solution to this problem must be in the kernel, or possibly > in libc underneath such subroutines as open, unlink, and chmod, (if you > have shared libraries or full source to recompile) or it won't work all > the time. Any solution to this problem must be applied to operating systems other than UNIX. As John Bruner pointed out, mandating case-insensitivity will only have the effect of removing UNIX from the list of standard-conforming systems. Changing the semantics of file names at this late date is unlikely to meet with approval from many UNIX vendors and users. For one thing, what are you going to do about directories that contain files named, say, "makefile" and "Makefile" (yes, they exist)? You may feel that having directories like this is a mistake, but declaring them to be a mistake isn't going to make them go away. There seem to be two issues here: 1) Should POSIX mandate case-sensitivity? 2) Should UNIX be changed to be case-insensitive if POSIX doesn't mandate case-sensitivity? These are rather separate issues. A case can be made that POSIX should not mandate case-sensitivity. Applications must then not depend on case-sensitivity. This will affect programs that create files with names other than those provided by the user. It could also affect programs that *read* directories, since they'd have to know that "foobar" and "FoOBaR" refer to the same file. I see great difficulty in changing UNIX to be case-insensitive, however. It certainly wouldn't pose any great *implementation* difficulties, but I would not like to bet that no user or program would be greatly affected. >From Mark R. Crispin: > It seems that the two sides in this issue boil down to this: > . "gee, since we're defining a standard portable operating system > that isn't necessarily the present de facto Unix, let's fix > this case sensitivity cretinism" > . "case sensitivity is what makes Unix better than any other > operating system, and only a cretin can't understand why this > is wonderful" Not really. A POSIX standard that does not *mandate* case-sensitivity need not *forbid* it. And I have seen *no* arguments that "case sensitivity is what makes UNIX better than any other operating system." > Let's start by discarding the arguments which are bogus. > The most glaring of these has got to be the international > compatibility argument. The only advocates of this argument seem > to be pro case sensitivity Americans who have seized upon this as > an argument to shore up their position without really thinking > over the issue carefully. Well, it may seem that way, but it isn't. I admit to being a United States citizen, but I am not unreservedly pro-case-sensitivity. I see the merits to both sides of the argument, but I see more problems with case-insensitivity than with case-sensitivity. > Unix does not allow arbitrary strings in filenames. Any > number of "funny" characters must be within a quoted string. I > can't say > rm foo.bar;1 > I have to say > rm "foo.bar;1" > Guess what. A number of foreign keyboards use those "funny" > characters to be non-English glyphs. As the moderator pointed out, the shell, not the operating system, interprets these funny characters. Applications need not get file names passed as arguments from the shell. The office automation system we developed at CCI had its own shell, which did no parsing of path names whatsoever; the only characters it forbade were the slash and the null character (because they are not allowed in UNIX filenames) and those characters its forms package didn't allow you to type in (because we never got around to changing it to do so). I frequently used file names containing blanks within this application, even though it made it inconvenient to manipulate those files using commands typed at the UNIX shell. > I have yet to hear of any organization in Japan using kanzi > or hirogana or katakana in filenames. I have a document in front of me from ASCII Corporation in Japan, describing changes made to 4.2BSD to support Kanji and Kana. It says: It is possible to create a file whose name contains Kana and/or Kanji characterss, since the file system and Kanji version of the shell support it. However, we don't recommend such filenames, becasue it is impossible to handle such files from ASCII terminals. The argument used against it would not apply if, for example, no terminals attached to the machine were ASCII terminals and the site didn't expect to export these files to machines with only ASCII terminals attached. The developers of it may be coming from a more "traditional" UNIX environment, where you have many ASCII terminals attached to the machine and where you frequently exchange files with other sites not running the same hardware and software that you are running. In an office environment, it may be possible to provide everyone with a Kanji/Kana terminal, and it may not be as important to worry about exchanging file with some random development machine in the United States. > There are good reasons for > this! One is that there isn't a single way of representing > written Japanese. In older terminals, the high order bit when > set indicated katakana (much as DEC VT220's use the high order > bit for their "international characters"). Modern Japanese > terminals use the JIS (Japanese Industrial Standard) system of > ESCAPE followed by two bytes to define a 14 bit character. The system they describe uses "Shift JIS" code for Kanji, and supports both terminals that use this code and the regular JIS code for Kanji; it does code conversion between the codes for JIS-Kanji terminals. > Some German keyboards use various 7-bit glyphs (I believe > "@" is umlaut-a) for their umlauts and ess-tset. Or, there's the > VT220 system. I just tried creating a file called Goethestrasse > (using umlaut-o for "oe" and ess-tset for "ss") on my local Unix > system using my VT220 clone. It made "GVthestra_e", the 7-bit > form. The latter sounds like ISO Latin Alphabet No. 1; "umlaut-O" has the hex code D6 and capital V has the code 56; 56 hex + 80 hex is D6 hex. (I believe DEC recommended the VT220 code set to ISO for standardization.) > Dare I mention that in German, only nouns (and the first > word in a sentence) are capitalized? The same is true of English; so what? > The point is that Unix does *not* support international > character sets in filenames. It supports 7-bit USASCII. So > let's leave that issue to rest. As the moderator pointed out, this is not the case. The kernel supports all characters except slash and the null character, except for the 4.[23]BSD kernel which (too helpfully) refuses to create files with characters in their name that have the eighth bit set. Certain UNIX utilities do not handle 8-bit characters; this is not, however, an intrinsic characteristic of the UNIX system. I would ask European and Asian customers what they wanted the UNIX system to do about character sets other than 7-bit USASCII before I casually dismissed the possibility of supporting them. > I haven't yet heard of any serious use of full 8-bit bytes > for filenames on any other operating system, which, if you are > serious about supporting international character sets, you must > do. There's this small problem of getting 8-bit (as opposed to > 7-bit) ASCII through various pieces of hardware and networks > which think that the high order bit is parity... Not all such pieces of hardware have this limitation. The paper from ASCII Corporation simply says "Kana and Kanji terminals must be set up to use 8 bit no parity mode." If other terminals use a 7-bit encoding of an 8-bit data stream, the terminal driver can do code translation transparently to the rest of the system. The fact that most OSes haven't solved these problems, and don't provide for full 8-bit characters in file names, doesn't mean there is no demand for full 8-bit characters in file names. The users in non-English-speaking countries may just have learned to get around this problem, and either use English-language file names or approximate their native spelling in file names. Volume-Number: Volume 7, Number 76