Path: utzoo!attcan!uunet!auspex!guy From: guy@auspex.UUCP (Guy Harris) Newsgroups: comp.unix.questions Subject: Re: why does -vi- set the hi bit when expanding `%' and `#'? Message-ID: <833@auspex.UUCP> Date: 11 Jan 89 23:04:06 GMT References: <8700002@gistdev> <450@oglvee.UUCP> Reply-To: guy@auspex.UUCP (Guy Harris) Organization: Auspex Systems, Santa Clara Lines: 43 >Maybe I'm just thick, or maybe I was home sick the day they explained >``shell-internal-quoting format'' to everyone, but would some kind >soul who knows what Chris is talking about care to fill me in? Inside most versions of the Bourne, C, and Korn shells (and maybe the V6 and PWB shells as well), strings containing quoted characters (yes, "quoted" as in "protected either with double-quotes or single-quotes, or with a backslash," so yes, >Is this the same as quoting sh meta-characters with '\'? it is the same) are represented by turning the 8th bit of a byte containing a quoted character on. "vi", in a rather slimy move, "knew" that this was the case, and instead of using, say, backslashes or single-quotes to quote characters in file names, it turned the 8th bit of the bytes containing those characters on, under the assumption that 1) the 8th bit would be passed through the shell intact and 2) would thus be interpreted as meaning the characters were quoted. Unfortunately, more recent versions of the Bourne and Korn shells do *not* use the 8th bit for this purpose, because they support 8-bit character sets. As such, while 1) is true, 2) isn't. >Is this something I need to care about beyond being curious? It's useful to keep the "8th bit" convention in mind if you may be working on a system whose shell uses it (older - pre S5R3 - Bourne shells, older - pre-"ksh-i" Korn shells, and all currently-available versions of the C shell that I know of), since you won't be able to use 8-bit character sets when typing commands to those shells. If your OS supports file names with 8 bit characters, for example, and a file with such characters in its name is created, you may have trouble removing it if you are using such a shell. It's also useful to keep in mind that using the 8th bit in such a fashion - or other fashions - interferes with support for 8-bit character sets, such as the ISO 8859 character sets that include accented characters for Western European languages other than English.