Path: utzoo!utgpu!news-server.csri.toronto.edu!bonnie.concordia.ca!thunder.mcrcim.mcgill.edu!snorkelwacker.mit.edu!usc!cs.utexas.edu!convex!texsun!csccat!ncmicro!ltf
From: ltf@ncmicro.lonestar.org (Lance Franklin)
Newsgroups: comp.sys.amiga.advocacy
Subject: Re: Announcement--new "Unicode" standard
Message-ID: <301@ncmicro.lonestar.org>
Date: 25 Feb 91 22:25:44 GMT
References: <1991Feb24.175313.10206@Neon.Stanford.EDU> <1991Feb24.220023.27245@cunixf.cc.columbia.edu> <1991Feb25.091818.19027@Neon.Stanford.EDU>
Organization: NC Microproducts, Inc. Richardson, Tx
Lines: 50

In article <1991Feb25.091818.19027@Neon.Stanford.EDU> torrie@cs.stanford.edu (Evan Torrie) writes:
}es1@cunixb.cc.columbia.edu (Ethan Solomita) writes:
}>What may be logical is
}>reserving certain 8-bit ASCII codes to mean that the next byte is
}>a letter in language FOO. That would probably cause the least
}>inconvenience.
}
}  I don't know whether this would be workable... There are something
}like 10000? distinct ideographs in the Asian languages, so you need
}around 13-14 bits to store them all.  This wouldn't leave you enough
}space to store the rest of normal ASCII. 
}  I imagine there would be some sort of compromise to allow old
}programs to run unchanged under a 16-bit code.

This should not be too difficult...by merely reserving 64 ascii codes
out of 256 as extended code, signifying that the next byte is a
character in one of 64 extended sets, you allow for 16384 extended
characters.  In a Japanese system that I worked with once, the standard
character set contained both standard english symbols and single
width Japanese characters, with the extended set all being double
width.  This had the added advantage that the string length of the
output string was also the number of character positions that the
output took up on the screen.  Of course, for ease of use, you'd
probably want to disallow the use of a zero-value byte for the second
byte, so you'd actually be talking about 64*255 extended characters,
or 16320 characters.

By the way, on this system, programs did run unchanged on either 8 or
16 bit systems.  The display hardware actually did the hard work.  On
an non-Japanese system, the strings were output as mostly junk
characters, with occasional english numbers or words interspersed.
In text modes, the display hardware did the correct mapping to standard
or extended character set, and I imagine that the display BIOS did
something similar when drawing text in a graphics mode.

Now, the big problem in this deal is not displaying the codes...that's
the easy part.  The hard part is figuring out stuff like string
functions in C, sorting routines...and how to handle languages like
Hebrew, where the text is displayed right-to-left (If memory serves).
In addition, if I'm not mistaken, a character's look may change based
on the characters that preceed and/or follow it.

All in all, a nasty little problem.

Lance
-- 
Lance T. Franklin            +----------------------------------------------+
(ltf@ncmicro.lonestar.org)   | "You want I should bop you with this here    |
NC Microproducts, Inc.       |    Lollipop?!?"                 The Fat Fury |
Richardson, Texas            +----------------------------------------------+