Path: utzoo!mnetor!uunet!lll-winken!lll-lcc!pyramid!tolerant!kennedy From: kennedy@tolerant.UUCP (Bill Kennedy) Newsgroups: comp.lang.c Subject: Re: Long Chars Message-ID: <1418@tolerant.UUCP> Date: 14 Mar 88 16:58:38 GMT References: <12341@brl-adm.ARPA> Reply-To: kennedy@tolerant.UUCP (Bill Kennedy) Distribution: na Organization: Tolerant Systems Inc., San Jose Lines: 78 Keywords: Foreign character sets In article <12341@brl-adm.ARPA> TLIMONCE%DREW.BITNET@CUNYVM.CUNY.EDU writes: >[ pun reference omitted ] > >The "short char vs char" problem can't be solved very easily. Why not a >"long char". That wouldn't break much code now, would it? Now I'm not >demanding that it goes into v1.0 of the standard but maybe we can look at >this for the next "congress". There are already specifications for it, AT&T has one and I think I read something from HP about it as well. It can be solved rather easily and it need not break much code if the code is well written. The same old dragon that breathed up the pointer/int thing just rears its ugly head again for characters. >For now, if you want to make some progress, try to get one of the biggies >(like MS) to add it as an extension. You can tell them that they'll hit >on the "multi-nation/multi-language vendor market" with it. I disagree. I am using long characters for a specific purpose and adding the baggage to domestic computing wouldn't serve any useful purpose. I don't think that you will get a software vendor to weave it in if it costs performance at compile or run time (which they do, both...). The hardware manufacturers will implement it themselves if they want to penetrate farther into the overseas markets. Remember it's not just a world of 7 or 15 bit characters, variations on the Roman alphabet are handled, e.g. Europeans, with the eighth bit (has it's own problems too, not pertinent). I don't think that you will get any momentum at all from software houses but I have first hand knowledge :-) that the computer manufacturers get pretty interested. >Of course, in my programming I don't have a use for it, but if you do, try > >typedef short LONG_CHAR; >or >typedef char LONG_CHAR[2]; >(Hmmm... I like the former) No offense intended but I wholeheartedly agree with "don't have a use..." and I would suggest it reads "haven't had any experience with...". I'm also not scolding you, I work with the things every day and there are some very real traps. If you just make it a typedef you'll get your storage sizes right (for the most part) but you can't manupulate either of your examples very well. I use lchar because it's easier to type then LONG_CHAR. You need a further refinement so that you can look at each byte and the bits within each byte, I use a structure and a union within that. >and then you can implement a lstrcmp() and a lstrcpy() and an assortment >of routines like that. Then when you're done, those can be re-used in all >your programs. You also need routines to convert into and out of strings containing long characters and some way to insulate yourself from cases and while(c) things that make assumptions about character size and content. To qualify the long character structure/union approach, vi, the shell, and I'm sure other programs use the MSbit of a character for their own pruposes. Many Asian terminals set the MSbit of a byte as a flag that another byte is coming with the rest of the character. In some European countries it's quite normal for the MSbit to be set for a special character native to their alphabet but absent from ASCII. So here you see but three uses of the MSbit that are darned near mutually exclusive and require further inspection of the byte stream. >When it get's suggested to ANSI C II (or whatever it'll be called) you'll >be there to warn us about implementation difficulties and ideas. And when >it gets passed you can do a search-and-replace from "LONG_CHAR" to "long >char" I'm not convinced that it belongs in the language specification because it is so implementation specific. In fact I'm not sure that it even needs to exist for hardware destined for a technical audience. Those professionals have learned to read ASCII like some of us did APL :-) When you start to bring in commercial applications where you want to drive down the level of skill required to operate a program, that's where you need the additional capability/overhead. You made a good start and now I have overkilled it for you... These are my opinions and observations, Tolerant is nice enough to let me use their equipment; so don't blame me on them. Bill Kennedy {rutgers,cbosgd,killer}!ssbn!bill or bill@ssbn.WLK.COM