Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/5/84; site mordor.UUCP Path: utzoo!watmath!clyde!burl!ulysses!allegra!mit-eddie!genrad!panda!talcott!harvard!seismo!ut-sally!mordor!jdb From: jdb@mordor.UUCP (John Bruner) Newsgroups: net.lang.c Subject: Re: integer types, sys calls, and stdio Message-ID: <22@mordor.UUCP> Date: Wed, 30-Jan-85 15:48:54 EST Article-I.D.: mordor.22 Posted: Wed Jan 30 15:48:54 1985 Date-Received: Sat, 2-Feb-85 00:35:01 EST References: <1997@mordor.UUCP> <631@turtlevax.UUCP> <550@mako.UUCP> Distribution: net Organization: S-1 Project, LLNL Lines: 113 Things have quieted down quite a bit since I asked my initial question, and I should be smart enough to leave things alone, but I guess I'm not. We gave up and implemented sizeof(int) == sizeof(long) with integers 36 bits wide, basically because we didn't want to have to convert the overwhelming mass of existing programs. Snoopy raises a point which I'd like to expand upon -- the idea of defining derived types "int8", "int9", "int16" which can be redefined when a program is moved from one machine to another. I had been doing some thinking about writing programs for maximum portability and how the language might be changed to encourage more portable programs. Here are some of my thoughts on this issue. By way of introduction, I am not a C novice. I learned C back in 1977 on a PDP-11/70 V6 UNIX system. I have used it to program on PDP-11's, VAXes, various Motorola 68K systems, and now our local machine (the S-1 Mark IIA). The programs have included user- and kernel-mode UNIX code, among other things. Most C users are blessed with a machine architecture that resembles a PDP-11 in several important ways: (1) it is a two's complement machine, (2) it is byte-addressible (where larger data types are some power-of-two number of bytes long), (3) it has an 8-bit byte, (4) it operates most conveniently on primitive data types which are 16 and 32 bits long, (4) it is not a tagged architecture, (5) memory is not segmented, but is allocated in one contiguous block (or perhaps two or three if you count text/data-bss/stack). Another characteristic which rears its head from time to time (although less often than the others, thanks to the popularity of the MC68XXX) is (6) bytes are ordered in a "little endian" fashion. Writing truly portable code in C does not come naturally. As we have discovered here in our efforts to port C and UNIX to the S-1, a lot of programs break when the machine that they run on does not satisfy one of the assumptions I noted above. For a mild example, consider the byte-ordering problem and how it shows up in programs such as "talk" (to name one example at random). Here at the S-1 Project we have two operating systems projects underway. The other operating system, Amber, is written in a language called Pastel (a "colorful" Pascal). Pastel has been significantly extended relative to standard Pascal, so that it supports separate compilation (by "modules", each of which may contain public and private parts), pointer manipulation, flexible argument passing to procedures and functions (i.e. varying number and type of arguments), good access to low-level machine instructions (MUCH better than the kludgey "asm" in C), and it produces excellent code. From time to time I have occasion to program in Pastel. While I prefer C, and I often find the Pascal-based syntax a little clumsy, I definitely miss a few of Pastel's features when I program in C. (I'll come back to a specific example below.) C is used to achieve two different ends. It is used to code machine-dependent routines (e.g. device drivers), and it is used to write machine-independent programs. Unfortunately, I fear that too much of its machine-dependent flavor carries over into programs that are supposed to be machine independent. The assumptions that I listed above are continually invoked, so that the resulting program won't go (at least, not easily) to another machine. Anyone who has tried to port programs written for the VAX (with implicit "int" == "long" assumptions) to machines like the PDP-11 knows what I mean. Having laid forth all of this philosophy, let me give one specific case and expand upon Snoopy's suggestion. I believe that C should provide some means of defining integer data types in terms of the range of values that the type represents, rather than the machine-dependent size of the storage cell that the type will occupy. The compiler can pick the correct storage size. Then "short" and "long" would be reserved for machine-dependent cases, and machines with larger word sizes can be easily accomodated. Why should the programmer have to worry about whether his value can fit in a "short" or whether a "long" will be necessary? I'm not familiar with Concurrent Euclid (perhaps I should look it up), but subrange types are an important central concept in Pascal, Modula, and Ada. Please note that I am not proposing any new features for the ANSI standardization effort. I'm expressing thoughts about future directions for C. (I don't recall seeing subranges in the C++ paper in the BSTJ [oops, BLTJ].) I'm not proposing to turn C into Pascal. Contrary to some of the sentiments expressed in this group, however, I do feel that C can benefit from an examination of languages like Pascal. Finally, let me hedge my way back toward the conservative camp and pose a question that should be asked in parallel with "what features does C need?" How can we raise the standards of C programmers (possibly without *any* language changes) so that the programs they write will be more portable? If we don't have explicit subranges, how do we encourage programmers to define and use things like "int8"? Other portability considerations should include standardized derived types, libraries, an understanding of pointers and integers (and why (int)0 is not the same thing as (int *)0), and other implications of the variety of machine architectures that C runs on. [BTW, a VAX Pastel compiler is available through the ARPA/MILNET by the anonymous account "ftp", file "pastel.bintape". This file is in "tar" format. If you don't have ARPANET access, you can contact Christine Ghinazzi, S-1 Project, Lawrence Livermore National Laboratory PO Box 5503 Livermore, CA 94550 for information on obtaining a tape copy. There is no charge.] -- John Bruner (S-1 Project, Lawrence Livermore National Laboratory) MILNET: jdb@mordor.ARPA [jdb@s1-c] (415) 422-0758 UUCP: ...!ucbvax!dual!mordor!jdb ...!decvax!decwrl!mordor!jdb