Path: utzoo!attcan!uunet!mcvax!ukc!reading!riddle!domo From: domo@riddle.UUCP (Dominic Dunlop) Newsgroups: comp.unix.questions Subject: Re: why does -vi- set the hi bit when expanding `%' and `#'? Message-ID: <969@riddle.UUCP> Date: 16 Jan 89 10:09:39 GMT References: <8700002@gistdev> <450@oglvee.UUCP> Reply-To: domo@riddle.UUCP (Dominic Dunlop) Organization: Sphinx Ltd., Maidenhead, England Lines: 50 [Already it's hard to keep track of who's quoting whom in this thread. Sorry if I've got it wrong...] In article <450@oglvee.UUCP> norm@oglvee.UUCP (Norman Joseph) writes: [Stuff about vi setting the high bit of each character in the filenames it produces when expanding `%' an `#' on shell command lines omited.] >In article <15219@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) writes: >> vi believes that by setting bit 7, it is quoting the file name, >> so that if you are editing the file `foo*bar.c', the command >> >> !echo % >> >> produces [in effect] >> >> !echo \f\o\o\*\b\a\r\.\c >> >> in shell-internal-quoting format (bit 7 set). > >Maybe I'm just thick, or maybe I was home sick the day they explained >``shell-internal-quoting format'' to everyone, but would some kind >soul who knows what Chris is talking about care to fill me in? (E-mail >would be fine. I'm sure people are falling asleep even as we speak :^). >Is this the same as quoting sh meta-characters with '\'?> ^^^^ Yes, except that, strictly, the backslash can be used to quote any character: it's just that the quoting is a no-op on any character other than a metacharacter. (Yes, this topic has scope for soporific semantic pedantry.) >Is this >something I need to care about beyond being curious? No. Apart from anything else, it's obsolescent, and its use by applications software has been deprecated for A Long Time (this deprecation having been broadcast in the same way as information about the `feature' itself -- that is, by word of mouth). As I understand it, we finally get to say goodbye to bit seven internal quoting with the System V, release 4 version of the shell. It's possible that it's been eliminated in V.3.1 and later as well. Comments, anybody? Why has it gone? Because it's a real pain in the butt for users of character sets which require all eight bits of a byte in order to represent all alphabetic characters. This turns out to mean most Europeans. (Asian character sets are something else again.) Having the shell interpret that eighth bit as a quote, then clear it, mangles text which includes characters (usually accented letters) which ANSI didn't think of all those years ago. The 1003.2 working group of the IEEE is drafting a standard for the shell command language. I don't have it to hand, but, as I recall, it effectively outlaws eighth bit quoting in the shell.