Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!utgpu!water!watmath!clyde!cbosgd!ihnp4!inuxc!iuvax!bsu-cs!neubauer
From: neubauer@bsu-cs.UUCP
Newsgroups: comp.std.internat,sci.lang
Subject: Re: Computers and human languages (was Re: What is a byte)
Message-ID: <1043@bsu-cs.UUCP>
Date: Sat, 22-Aug-87 20:04:50 EDT
Article-I.D.: bsu-cs.1043
Posted: Sat Aug 22 20:04:50 1987
Date-Received: Sun, 23-Aug-87 19:36:08 EDT
References: <111@quick.UUCP> <2842@ulysses.homer.nj.att.com>
Organization: CS Dept, Ball St U, Muncie, Indiana
Lines: 45
Xref: utgpu comp.std.internat:148 sci.lang:1125

In article <111@quick.UUCP>, srg@quick.UUCP (Spencer Garrett) writes:
> I was told once (by a respected linguist, as I recall) that English and
> Russian are the ONLY two languages written with unaccented alphabets.  I
> know you have to add the qualifier "modern" to make that true, and maybe
> "major" as well, although I don't know of any exceptions right off.  I don't
   ^^^^^ that might do it
> know whether he didn't count Katakana and Hiragana as alphabets or whether
	They are not alphabets, they are syllabaries, i.e., each symbol
	represents a whole syllable.
> one cannot (or normally would not) write Japanese entirely in one or both
> of these scripts.  He seemed to think that an unaccented alphabet was a
> substantial advantage in an information age, and I would tend to agree.
	So would I.

In article <2842@ulysses.homer.nj.att.com>, jss@hector..UUCP (Jerry Schwarz) writes:
> I quote from a draft of the Rationale of the proposed 
> ANSI C standard, section 4.4:
> 	The English language uses 26 letters derived from the
> 	Latin alphabet. The set of letters suffices for English, 
> 	Swahili, and Hawaiian; all other living languages use
> 	either the Latin aphabet plus other characters, or other 
> 	non Latin aphabets or syllabaries.
> They cite no reference for this piece of trivia.

Just as well, since it is not true.  Another counterexample (from off the
top of my head):  Hmong.  If necessary, we could undoubtedly come up with
more, but there is really no point.  We don't really need to worry about it
for C programs, since the characters needed for that are already known.
What we do need to worry about is how to set up computer facilities, e.g.,
keyboards, and how to represent the modified letters in languages that DO
have diacritics.  

It has already been established that simply using the high bit of an 8-bit
byte for +/- modified will not do, both because of multiple diacritics for a
single letter in a given language, and also because of multi-lingual text.
It is certainly far less elegant to simply assign a byte from the upper 1/2
of the byte range (i.e. with high bit set) to each known modified letter.
If we stick to the Latin alphabet, though, there are probably enough
unassigned bytes to do it.  That will leave very odd sets of bit patterns to
represent the letters of a given language, but the alternative would appear
to be to scrap ASCII altogether if we intend to make some kind of rational
scheme of it.

-- 
Paul Neubauer 	UUCP:  {ihnp4,seismo}!{iuvax,pur-ee}!bsu-cs!neubauer