Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!think.com!snorkelwacker.mit.edu!bloom-picayune.mit.edu!news From: scs@adam.mit.edu (Steve Summit) Newsgroups: comp.lang.c Subject: Re: Can Novices Jump Directly in C? (Books) Message-ID: <1991Feb9.042957.20160@athena.mit.edu> Date: 9 Feb 91 04:29:57 GMT References: <11929@helios.TAMU.EDU> Sender: news@athena.mit.edu (News system) Reply-To: scs@adam.mit.edu Organization: Thermal Technologies, Inc. Lines: 161 In article enag@ifi.uio.no (Erik Naggum) writes: >C is not well suited for first time programmers due to its intimacy >with the hardware. In article <11929@helios.TAMU.EDU> scm3775@tamsun.tamu.edu (Sean Malloy) writes: >I'm afraid that I have to agree with the above gentleman; C is not >generally good for first-time students unless they have a basic >knowledge of the hardware underneath. I agree that C is not a good first language; unfortunately I don't know of a better one. (C's popularity, applicability, and availability overcome its drawbacks rather too effectively.) However, it is not (I hope) necessary to think in lowest-level, hardware terms to learn and use C effectively. Were this in fact necessary, C would be quite a failure as an HLL. It is true that most experts can and do think in hardware terms when programming in C, whether they have to or not, and this way of thinking is unfortunately often reflected in their teaching and writing. My biggest complaint with most introductory C textbooks I've seen is that they unabashedly explain everything in hardware terms, referring to "machine addresses" and "word sizes." Frequently, they provide exercises which suggest that students write deliberately nonportable, machine-dependent programs, either to show why they don't work, or to discover parameters (word size, endianness, etc.) of the student's machine. This has got to be bewildering to the beginner. Furthermore, I believe that beginning (if not all) programmers are very strongly influenced by the code they see while learning, and that much of the deliberately awful code which is so often presented ("See how bad this is? Now don't you ever write anything like this!") actually ends up being emulated. If you never see good code, what else can you do but emulate the bad code you've seen, resigning yourself to the apparent fact that programming is an ugly job? (Several people will now point out that, like it or not, one has to be able to read bad code, and that books like The C Puzzle Book are therefore Good Things. Perhaps we can avoid having that argument again.) Dave Lebling tells the story of the Zork player who believed that, in every game, you had to press the blue button, flooding the FCD#3 control room, and then hastily fix the leaky pipe with the gunk from the tube that looks like toothpaste before continuing with the rest of the game. How many programmers believe that they have no choice but to write code which depends on the machine they're using, and that rewriting it for other machines is just a fact of life? As Kernighan and Ritchie say, and I am fond of quoting, "if you don't know _how_ [things] are done on various machines, that innocence may help to protect you." Getting back to whether or not you need to think about the hardware in order to understand C, here's a frequent question which is often answered in low-level, hardware terms: Q: I had the declaration char a[5] in one source file, and in another I declared extern char *a. Why didn't it work? Curious people are often dissatisfied with a blanket answer like A: The declaration extern char *a simply does not match the actual definition. The type "pointer-to-type-T" is not the same as "array-of-type-T." Use extern char a[]. They want to know WHY. We have to explain that, given char a[5]; and char *p; the expressions a[3] and p[3] generate significantly different code. We can (I think) make a perfectly clear explanation by discussing the abstract machine on which C is based, without appealing to actual hardware terms. The discussion (and, yes, I'm planning on adding it to the FAQ list) goes like this: "When we say char a[5], we are requesting that a place for five characters be set aside, to be known by the name `a'. That is, there is a location named `a' at which five characters can sit. When the compiler sees the expression a[3], it emits code to start at the location `a', move three past it, and fetch the character there. On the other hand, when we say char *p, we are requesting a place which holds a pointer. The pointer is to be known by the name `p', and can point to any char (or contiguous array of chars) anywhere. When the compiler sees the expression p[3], it emits code to start at the location p, fetch the pointer there, add three to it, and finally fetch the character pointed to." As usual, a picture is worth a thousand words (they're just hard to draw well in ASCII): +---+---+---+---+---+ a: | h | e | l | l | o | +---+---+---+---+---+ +-----+ +---+---+---+---+---+ p: | *======> | w | o | r | l | d | +-----+ +---+---+---+---+---+ We can see right away that both a[3] and p[3] are 'l', but that you get there differently. I don't claim to have invented this label, box, and pointer notation; it's used often. (As I recall, there's a nice pic picture much like this in chapter 5 of K&R2.) Now, a lot of you are probably saying "wait a minute, he said he was going to explain it without resorting to hardware terms, and he turned right around and explained it in hardware terms." Though I was careful to use words like "location" and "place" instead of "address" and "memory," I have to admit that the discussion is still pretty low level. Notice, however, that I didn't muddy the water by saying "suppose location `a' is address 0x1234," and I avoided saying exactly how big that box that holds a pointer is. I think anyone who has ever used a pocket calculator has some notion of a "register," namely a little box that can hold values; and no matter what computer language you're learning, you're bound to think about "values" being stored in "variables" that have "names." The point is that yes, you have to think about locations, values, arrays, pointers, and the like; but no, you don't have to talk about "the hardware," that ints are 16 bits, that pointers are really addresses consisting of a segment and an offset, that when you add an int to an int * the compiler actually scales it by sizeof(int), or any of those other "explanations" which somehow only manage to make things more complicated and harder to explain. I don't want to sound like a knee-jerk C defender; C *is* hard to learn, and the criticism that it is not a good beginner's language is entirely valid. But, next time you try to teach somebody about C (or, if you're still learning, next time you do any reading or work on a program) just think about those little boxes and labels, and don't worry about "the hardware." If you are assigned exercises to Write a program to discover the sizes of the various types on your machine or Explain the behavior of int i = 5; printf("%d %d %d\n", i++, i++, i++ + i++); , refuse to do them. The thinking about little boxes and labels that you do in C can be tricker than the equivalent little boxes and labels in BASIC, because there are more things you can do in C. But it doesn't have to be as complicated as it is often made out to be. And stay away from the hardware terminology! Steve Summit scs@adam.mit.edu