Path: utzoo!utgpu!news-server.csri.toronto.edu!bonnie.concordia.ca!thunder.mcrcim.mcgill.edu!snorkelwacker.mit.edu!usc!cs.utexas.edu!convex!texsun!csccat!ncmicro!ltf From: ltf@ncmicro.lonestar.org (Lance Franklin) Newsgroups: comp.sys.amiga.advocacy Subject: Re: Announcement--new "Unicode" standard Message-ID: <301@ncmicro.lonestar.org> Date: 25 Feb 91 22:25:44 GMT References: <1991Feb24.175313.10206@Neon.Stanford.EDU> <1991Feb24.220023.27245@cunixf.cc.columbia.edu> <1991Feb25.091818.19027@Neon.Stanford.EDU> Organization: NC Microproducts, Inc. Richardson, Tx Lines: 50 In article <1991Feb25.091818.19027@Neon.Stanford.EDU> torrie@cs.stanford.edu (Evan Torrie) writes: }es1@cunixb.cc.columbia.edu (Ethan Solomita) writes: }>What may be logical is }>reserving certain 8-bit ASCII codes to mean that the next byte is }>a letter in language FOO. That would probably cause the least }>inconvenience. } } I don't know whether this would be workable... There are something }like 10000? distinct ideographs in the Asian languages, so you need }around 13-14 bits to store them all. This wouldn't leave you enough }space to store the rest of normal ASCII. } I imagine there would be some sort of compromise to allow old }programs to run unchanged under a 16-bit code. This should not be too difficult...by merely reserving 64 ascii codes out of 256 as extended code, signifying that the next byte is a character in one of 64 extended sets, you allow for 16384 extended characters. In a Japanese system that I worked with once, the standard character set contained both standard english symbols and single width Japanese characters, with the extended set all being double width. This had the added advantage that the string length of the output string was also the number of character positions that the output took up on the screen. Of course, for ease of use, you'd probably want to disallow the use of a zero-value byte for the second byte, so you'd actually be talking about 64*255 extended characters, or 16320 characters. By the way, on this system, programs did run unchanged on either 8 or 16 bit systems. The display hardware actually did the hard work. On an non-Japanese system, the strings were output as mostly junk characters, with occasional english numbers or words interspersed. In text modes, the display hardware did the correct mapping to standard or extended character set, and I imagine that the display BIOS did something similar when drawing text in a graphics mode. Now, the big problem in this deal is not displaying the codes...that's the easy part. The hard part is figuring out stuff like string functions in C, sorting routines...and how to handle languages like Hebrew, where the text is displayed right-to-left (If memory serves). In addition, if I'm not mistaken, a character's look may change based on the characters that preceed and/or follow it. All in all, a nasty little problem. Lance -- Lance T. Franklin +----------------------------------------------+ (ltf@ncmicro.lonestar.org) | "You want I should bop you with this here | NC Microproducts, Inc. | Lollipop?!?" The Fat Fury | Richardson, Texas +----------------------------------------------+