Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!csd4.milw.wisc.edu!bionet!agate!ucbvax!hplabs!hpfcdc!donn From: donn@hpfcdc.HP.COM (Donn Terry) Newsgroups: comp.unix.wizards Subject: Re: POSIX Regular Expression Funnyness Message-ID: <5980041@hpfcdc.HP.COM> Date: 31 Jan 89 00:50:05 GMT References: <4118f7b1.ae48@apollo.COM> Organization: HP Ft. Collins, Co. Lines: 28 Ken Arnold's point about [[:alpha:]] is well taken. I suspect that if the proposal had been as he suggests that someone else would be saying that [:alpha:] must mean :,a,l,p,h,a, with : specified twice, for backwards compatability. Maybe not, but in the standards business it's easy to get paranoid because for practically any possibly controversial point, there's at least 2**n (where n is the number of partipants) viewpoints before everything gets settled. (Well, maybe 2*n :-) ). In Doug Gwyn's comments about [:ch:] As far as character classes: these are specified by the natural language involved. My Spanish is weak, but the *two characters* ch are treated as a single symbol with its own place in the collating sequence. c and h can also appear independently, but when adjacent they are collated as another symbol. This is arguably a kluge, but it antedates the computer business by a few hundred years, and a few million users, so I doubt we can change it just for the sake of aesthetics. Remember, we (native-)speakers of English are awfully spoiled by having a reasonably regular alphabet. It's reasonable to ask what things would have been like had computers had their initial development in, say, China or Japan, where the alphabet problem is much worse. I think the simple model of English may have sped things up initially, but it's now turning into an impediment for dealing with the rest of the world. (Oh well, we make up for a simple alphabet with hideously irrational spelling, even discounting the British/American differences :-) ). Donn Terry HP, Ft. Collins.