Path: utzoo!attcan!uunet!tut.cis.ohio-state.edu!ucsd!sdd.hp.com!elroy.jpl.nasa.gov!ames!haven!decuac!shlump.nac.dec.com!mountn.dec.com!minow From: minow@mountn.dec.com (Martin Minow) Newsgroups: comp.std.misc Subject: Re: Int'l Character set (Was: Re: filen Message-ID: <1723@mountn.dec.com> Date: 28 Jun 90 15:07:19 GMT References: <10720@<1990Jun1> <17000002@WL9.Prime.COM> <1679@mountn.dec.com> <91@lysator.liu.se> <1676@hulda.erbe.se> <106@lysator.liu.se> Reply-To: minow@bolt.enet.dec.com (Martin Minow) Organization: Digital Equipment Corporation Lines: 41 In article <106@lysator.liu.se> aronsson@lysator.liu.se (Lars Aronsson) writes: >Is it the manufacturers who are stupid as they use standards outside >their intended scoop or is it standard authors who are stupid as they >write standards with too narrow scoop? ... I think that ISO should have >written a general standard for storing and random order retrieval of text >(that is, no escape sequences) rather than the six versions of ISO 8859. >Perhaps, I am stupid. No, I doubt that Lars is stupid; nor are the manufacturers or standard writers. The vast majority of text will be written in a single language family, and, hence, in a single variant of ISO-8859 (which has more than six versions, but that is irrelevant). A text editor, to use Lars' example, generally whould not see escape sequences in the embedded text (they would be absorbed by the file/terminal read process), but would see a single 256 byte dataspace (of which 94+95 are graphics). Editors that do need to deal with multiple ISO-8859 instances (say, an editor that must handle both Swedish (ISO 8859-1) and Hebrew (ISO 8859-9) must establish its own internal mechanism for this problem. (In the one I wrote, each character was represented by a 16-bit quantity that encoded both character and character set designator.) Manufacturers chose ISO 8859 (-1) in order to gain a consistent character set representation among all applications and datafiles within a computer system. This seems to me to be a prefectly reasonable decision: having multiple representations within a single system is, I can state from experience, a mess. The standard writers cannot, on the other hand, look outside the scope of their "data transmission" standard. For example, I might choose to represent data within a system in a Huffman or Lempel-Ziv encoding: as long as the external users see the ISO character set, I can claim conformance to the standard without embarrasment. There are, by the way, standards for storing and retrieval of text that were developed by, among others, libraries and cancer registries. Spending a few hours in a good reference library will give you more standards than you might wish. Martin Minow minow@bolt.enet.dec.com The above does not represent the position of Digital Equipment Corporation