Path: utzoo!attcan!uunet!snorkelwacker!usc!cs.utexas.edu!yale!cmcl2!lanl!jlg From: jlg@lanl.gov (Jim Giles) Newsgroups: comp.society.futures Subject: C's sins of commission (was: (pssst...fortran?)) Message-ID: <62927@lanl.gov> Date: 14 Sep 90 02:08:25 GMT References: <1990Sep13.185833.17455@cunixf.cc.columbia.edu> Organization: Los Alamos Natl Lab, Los Alamos, N.M. Lines: 260 From article <1990Sep13.185833.17455@cunixf.cc.columbia.edu>, by wp6@cunixa.cc.columbia.edu (Walter Pohl): > > What do you mean about C's sins of commission? > Do you mean the lack of type checking? [...] Actually, what you're asking is a tough question. There are so many problems with C that just listing the more obvious ones would take many pages. It is difficult to turn to _any_ page of the C draft standard without stumbling upon something with which I completely disagree. (By the way, it is difficult to turn to any page of the final C standard because I haven't seen any copies of it. Has it even been published yet? It was finalized in January/February.) Yes, type checking is a problem with C. To my mind, it is one of C's least egregious faults. For one thing, most violations _are_ illegal in C - just that most implementations don't bother checking. I make a careful distinction between a language and any particular implementation. The faults of C that I most object to are those which cannot be corrected because the language itself requires them. As I said, _most_ type violations are already illegal. Not all though. Unions are not discriminated. Pointer 'casts' are allowed (essentially between _any_ two pointer types - officially, casts can only be between 'void' pointers and others but cast first to void then to anything else is legal). This leads us to pointers. Just about everything about C pointers is bad. From the fact that pointers are hopelessly confused with arrays (which are completely separate conceptually) to the syntax of pointer use, C's pointers are a mess. In addition, many language design people now feel that pointers of _any_ kind are a bad idea. C.A.R. Hoare condemned them as long ago as the early 70's (about the time C was 'designed'). He pointed out that pointers are the data structuring element that corresponds to GOTOs in flow control - if the one is bad, so is the other. ----------------------------------------------------------------------------- Since this is comp.society.futures, I will discuss pointer replacements. Essentially, pointers only do three things for you: 1) recursive data structures (graphs, trees, etc....); 2) dynamic memory; and 3) run-time 'equivalence'. C pointer arithmetic only does what one dimensional array indexing already does (scaled address calculations): arrays are better for this - so it's _not_ counted as one of the features of pointers. Recursive data structures are best implemented directly (to use a C/Fortran like declaration syntax with the type names on the left): Type Tree is record integer :: value tree :: left, right end type Tree Note that the elements inside a tree-valued data type are not _pointers_ but are actually trees themselves. No more confusing pointers with what they point to - the pointers aren't explicitly visible. No more forgetting the dereference operator (or, conversely, putting it in incorrectly) - there isn't a dereferencing operator. To be sure, the compiler _may_ internally use pointers to do the implementation of these recursive structures (but then, it probably uses GOTOs to internally implement loops), but since they aren't explicitly visible to the user, his life is much easier. Dynamic memory should also be implemented directly. Again, here is an example: Dynamic Integer :: a(:,:) !-- declares two dimensional a ... use of a here is illegal - not allocated yet ... ALLOCATE a(50,100) !-- allocates 5000 words memory for a ... use of a here is legal ... Of course, there would have to be an inquiry function do detect whether the object was allocated or not. Further, the decision would have to made in the language design whether deallocation would be automatic (garbage count, reference count, etc.) or whether the user would have to explicitly deallocate things. Either way, this is simpler, safer, and easier to code, use, and debug than pointer usage. Further, the compiler can optimize uses of the dynamic object with the knowledge that it's not aliased to anything - a fact the compiler cannot deduce from malloc() calls (which as far as the compiler knows is just a function which might be returning just any old address it feels like). Run-time equivalencing is a feature which some people (with a good deal of justification) claim shouldn't be allowed at all. I disagree. But there are still some distintions to be made. First, equivalencing might be used just reuse statically allocated space (although, using dynamic memory is probably better). Equivalence might also be used to provide a form of array reshaping or slicing - here pointers are inadequate: try the ALIAS/IDENTIFY feature in the first draft Fortran 8X proposal. Equivalence might also be used for defeating type checking - but here I prefer to recommend the below: Type Float_internal is record bit.1 :: sign bit.8 :: exponent bit.23:: significand End type Float_internal Float :: x !-- x is a simple float variable Map x as Float_internal !-- overlays record onto x x = 5.0 !-- x used as usual x.sign = 1 !-- negate x - use the mapping x.exponent=x.exponent+1 !-- multiply x by 2 - use the map ... etc ... This makes the defeating of the type checking explicit and also makes the indended use clearer. One of the problems with C pointers is that you can locally tell if a pointer is supposed to be an array, a recursive structure, an allocated object, or some exotic run-time equivalence. Providing all these possible features with high-level syntax and separate functionality improves the clarity of the code. It usually even makes the code more succinct (shorter). So, to make a long story short (too late), I haven't yet found any application which _needs_ explicit pointers either for speed or functionality. The above replacements either conceal or eliminate pointers and are as (or more) efficient and easier to use. ----------------------------------------------------------------------------- Now, back to C. Related to type checking is mixed mode. I don't object to mixed mode, in fact: I support it. But C's rules for applying it are not reasonable. The _claim_ is that the rules are designed to allow speed. Actually, there is no rational reason for minus five divided by a thousand to _ever_ be positive or to _ever_ be larger than one in magnitude. The C rules sometimes require that (-5/1000U == some large machine dependent constant). The C type heirarchy needs considerable adjustment. This brings us to mixed type operations (not just mixed mode). Since C has no 'logical' type, you are allowed to mix arithmetic with the results of conditionals with wild abandon. I have never seen any advantage to this - I HAVE seen a lot of people make a lot of costly and time consuming mistakes as a result. Further, the lack of a 'logical' data type means that they must provide more than one set of boolean operators (and, or, not, xor) in order to have bitwise and logical distinguished. So, the next point is this bit about C's operators. There are too many operators and too many precidence levels. Some (like the logical vs. bitwise problem) would not be necessary if C had better intrinsic data types. Others perform functions which would probably be better done as function calls (intrinsics which could be inlined of course). Still others (like pointer dereferencing) should probably not exist at all. In spite of all these operators, character string concatenation, string comparison, and substring operations are _not_ operators. Even Fortran is better. Data type declaration "operators" (or whatever you want to call the syntax elements) are particularly ugly, obscure, peculiar, difficult, and arcane. I'm told that this is because they wanted a declaration of a data type to look like a use of that type. This leads us to: The use of complicated data types is particularly ugly, obscure, peculiar, difficult, and arcane. At least they met their goal, the syntax of using the variables is every bit as bad as that for declaring them. Assignment operators are necessary in a procedural language. But, these combinations of assignment with other operators is just useless syntactic sugar. Personally, I don't care if the language has them or not, but they do clutter up the syntax quite a bit. The main problem with assignment is not the operators, per se, but the fact that they are allowed _within_ an expression. There have been several well conducted experiments on the effect of such operators on user productivity - the conclusion has been that assignment should be a statement level operator and _not_ an expression level one - at least, if you want to maximize user productivity. While we're on the subject of productivity experiments, here's a few other C features that have failed such tests: Control structures which used 'compound statements' (ie. sequences bounded by BEGIN/END or {/} as C spells them). Better is the IF/ELSEIF/ELSE/ENDIF, WHILE/ENDWHILE , etc. style. Even better is allowing control constructs to be given unique labels and matching them up (ie. Ada and Fortran 90 have this feature). End-of-line ignored within comments. Comments should be terminated by the end-of-line mark. C++ has the option of doing this. Unfortunately, it still retains the old wraparound version as well (the danger of developing a backward compatible language is the load of junk that you can't get rid of). End-of-line ignored within statements. The experimenters decided that people just seem to regard the end-of-line as the same as the end-of-statement, they really do. Even C programmers intuitively know this. I examined 10,000+ lines of commercial C code and found only 12 lines which used the C ability to wrap statements across lines automatically. Even so, forgotten semicolons almost _all_ occur at the end-of-line, and it is still a very common syntax error. I think the end-of-line mark should be a synonym for semicolon and should be escaped in the rare (12 out of 10,000) case that a continuation is needed. Pointers - well, we've talked about them. GOTOs. This is an interesting subject because there are actually conflicting results here. Spaghetti code clearly (and in the experiments, this was shown) causes massive productivity problems. However, in the test involving BEGIN/END control flow brackets, GOTOs were found to be one of the things which were better (by about a factor of 2) than 'compound statements'. Other experiments involving "disciplined" GOTO usage (with "disciplined" pretty much meaning you'd expect) were compared with "Structured" GOTO-less programs and _no_ statistically significant difference with productivity was observed at all. Actually, in this one case, I think C has got it exactly right - leave unrestricted GOTO in the language _and_ provide all the "Structured" control flow constructs. One of the very few things that I think C did right. There are several other experimental results - this is just a sampling. The only experiment that I've ever seen in which the losing feature wasn't in C was the one that showed that semicolon should be a terminator not a separater. C got this one right. C was on the wrong side of every other experiment I've ever seen. Some non-experimental features which are widely regarded as bad ideas: Case sensitive syntax. In a case insensitive language, code can be easily shared, teamwork is easier, and upper-case can be used for emphasis or other documentation purposes. In a case sensitive syntax, communication between sites (or even down the hall) is impeded by differing case conventions. People waste time ironing this out and not doing more useful work. Nonintuitive syntax. This is very common in C. If a concept has a widely developed and simple notation which is compatible with the keyboard and/or print devices available, the language _should_ make every effort to accomodate this common notation. I will give one specific example: what in the world possessed them to use a leading zero to distinguish octal from decimal??? Inconsistent syntax. Also common in C. An operator, keyword, or construct should have the same meaning (as nearly as possible) in every context in which it is allowed. A specific example is the keyword 'static', which means that the memory for the corresponding variable being declared is permanently associated with the variable for the entirity of run-time - except in the beginning of a file (outside and procedure), where 'static' suddenly means the same thing that other languages call 'private'. (All variables declared outside of procedures have permanently allocated memory anyway - so, 'static' should be regarded as redundant there.) Well, as I predicted, even to touch on the small number of obvious problems is several pages long. I trust that you can see there are still others lurking in the language specification (like 'switch', which doesn't automatically put a 'break' between the cases - whoops - I can't stop once I'm on a roll). J. Giles