Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!henry From: henry@utzoo.UUCP (Henry Spencer) Newsgroups: net.lang.c Subject: Re: Randomly-Signed Character Variables Message-ID: <4308@utzoo.UUCP> Date: Wed, 12-Sep-84 16:01:38 EDT Article-I.D.: utzoo.4308 Posted: Wed Sep 12 16:01:38 1984 Date-Received: Wed, 12-Sep-84 16:01:38 EDT References: <30@sdcsvax.UUCP> Organization: U of Toronto Zoology Lines: 53 Keywords: sign extension, character variables > ... what will > happen if char variables are randomly sign-extended? In other words, does > a portable program assume that char variables are consistent in their > sign-extension? Interesting question. One can argue (I have been heard to do so) that if a program is to be portable, it can use char variables for only two things: (1) characters, which are guaranteed non-negative by C, and (2) small non-negative integers. If a program is portable in this fairly-strong sense, there's no problem because the top bit is never on and the sign-extension behavior is irrelevant. One place where I would foresee problems is in things like hashing and checksums. I have been known to write code which stated, in a comment, "doesn't matter whether chars are signed or not, but it better be consistent!". I never analyzed the programs deeply to determine whether there really would be problems, but there was obviously enough rope there to hang oneself with. I guess my overall reaction is that there's a good chance that inconsistent sign extension wouldn't foul up too many things, but I would hate to have to bet money on it. The current draft of the ANSI standard says: ... If [things other than `ordinary' characters] are stored in a char object, the behavior is implementation-defined: the values may be treated as either signed or non-negative integers. [Section 2.2.5, draft of 21 Aug 1984] Implementation-defined behavior -- behavior that depends on the characteristics of the implementation and that must be documented for each implementation. [Section 1.1, draft of 21 Aug 1984] The wording could probably be improved, but the current version seems to say that you had better be able to document just how your chars behave, rather than just saying that sign extension occurs or doesn't occur at random. (Note that compiler optimizations etc. may alter the exact form used to access a character variable, so the source code isn't a reliable guide unless the compiler is very careful.) > Note that if consistency is desired, the "most optimal" choice will vary > with the application. If lots of references are made to char variables > via pointers, the choice will be sign-extended chars; if lots of references > are made to ordinary variables (or anything requiring an offset from a > pointer), the choice will be unsigned chars. Which access type predominates? Chars are accessed an awful lot via pointers, since that's how all string manipulation is done in C. I would think that simple char variables and offset references would be rather less common than just "*cp". -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry