Path: utzoo!attcan!uunet!aplcen!uakari.primate.wisc.edu!sdd.hp.com!usc!ucsd!ucbvax!AI.MIT.EDU!bson
From: bson@AI.MIT.EDU (Jan Brittenson)
Newsgroups: comp.society.futures
Subject: Re: C's sins of commission (was: (pssst...fortran?))
Message-ID: <9009220848.AA00539@wheat-chex>
Date: 22 Sep 90 08:48:56 GMT
Sender: usenet@ucbvax.BERKELEY.EDU
Organization: The Internet
Lines: 152
X-Unparsable-Date: Sat, 22 Sep T  04:48:12 EDT


   This message is quite long. I apologize if you think I'm filling up
your mailbox with junk flamage.

Jim Giles:

 >> 	1. Pointer range check (to see if a buffer crosses page
 >> 	   boundaries, for instance).

 > Well, without pointers, why do you need a pointer range check?  Computing
 > the range of something that doesn't exist seems a little silly.

   Pointers _are_ addresses, and nothing else. Regardless of whether
they include segment information, or other information relevant only
to non-state-of-the-art architectures. The "address" idiom covers all
information relevant to locating the addressee. Pointers may be
interpreted differently, depending on the datum, though. On a pdp-10,
not only is a word address necessary, but also a character index
within the word if it's a character pointer.

 > I think you had in mind casting the pointer to an int and looking at
 > the raw address - the ANSI standard leaves this process undefined.

   You're right, that was my intent with the buffer example. But
unless _somehow_ a means of retrieving the address of the buffer - a
pointer to it - is provided, the page boundary check can not be done
_at all_, defined or undefined, portable or not. To me the simple
C-style casting is preferable to some obscure union declared miles
away, since pointer-to-int casting at least tells me what is going on.
Besides, in almost any implementation casting a pointer to an int of
sufficient size and then later back, will yield the original pointer.
I most certainly would refuse to use a compiler for which this
assumption wasn't correct. If the machine hardware is such that it's
not a reasonable assumption to make - say on a Lisp Machine, for
instance - then, well, forget about portable C code.

 > Now, if you're talking about non-standard extensions to C which would
 > allow you to do this stuff - then any other language can contain the
 > same non-standard extensions.

   Extensions, or non-uptight about pointer typing, call it whatever
you like.

 >> [...]
 >> 	2. Calculate physical addresses for DMA controllers.

 > Why should I care?  The system/environment should be able to give me the
 > address if I need it.  But, how do I use a raw address anyway? _Standard_
 > C pointers don't give me any such access.  Access to such things as
 > hardware controllers should be privilaged to the system - and _it_
 > can contain machine dependent code - like assembly.

...or like C, which most certainly is more defined than assembler!

   I'm not sure what kind of programming you're talking about. There
are languages which are defined similar to what you have described
here, but few outside academia use them - Euclid for instance.
According to my experience, programmers can be put into either of two
major groups: application programmers and system programmers. While
the former use various 4G and other kinds of application-oriented
tools - such as XYZ-SQL, COBOL, or Prolog, to write applications, the
latter do the system-dependent stuff, such as database, server, and
support tool implementations - mostly things that are system-dependent
to start with. Neither of these groups would have particular use for
your proposed language - the application people would ask you what
syntax applies to selecting records in a database, while the system
people would ask you how to set up 2D bitblt operation in a graphics
device, or how to create a channel program in a mainframe environment.

   For sure, some of the work done by system folks falls somewhere
in-between. But I seriously doubt programming efficiency or
maintenance would be improved to any degree worth mentioning by
forcing everyone to learn Yet Another Language and an entirely new set
of idioms when the previous ones are considered quite sufficient.

   Can you give me one example of a project you or a first-hand
reference has been involved in that falls between the two major
categories I've outlined above, and which by itself constitutes a
project large enough to warrant not simply making do with what you've
got and are used to, and possibly for an employer to require
experience with your language as desirable?

 >> [...]  > 	3. Sort a linked list on addresses of some data
 >> pointed to > 	 from within the node. Or to keep it sorted as new
 >> (addresses > 	 of) data is added.

 > I guess you'll have to tell me how this differs from sorting on the
 > index of the data within an array or sequence.  Since the sequence is
 > dynamic, ...

   So how do I know where a certain index resides? I guess this would
be an undefined topic - although in this example it would be well
defined in C, since the buffers would be of the same type (i.e.
arbitrarily dimensioned character vectors).

 > ... you can add all the elements you wish - and still sort on index.

   How do I know that the addresses of the previous indexes do not
change as new elements are added? This would have to be undefined, as
well.

 >> [...]
 >> 	4. Implement malloc()/free().

 > When I found out that the ANSI C standard prohibited comparing/subtracting
 > pointers to different objects, I pointed out on comp.lang.c that malloc()
 > and free() could not not be written in _standard_ C.  They agreed with me.

   No doubt you're correct. Implementation is fairly trivial in
"nonstandard" C, and I fail to see how it could be made easier or more
"defined" without any pointers (i.e. explicit object addresses) at all?

 >> [...]
 >>    I'm curious as to why so many programmers engage themselves in hot
 >> debates over how to best implement strings. String processing is
 >> proportionally insignificant - the first thing done after a read is
 >> usually a tokenization, either through hand-written code or the output of
 >> a lexical front-end generator.  [...]

 > Tokens are also strings ....  Symbol tables also contain strings
 > among other stuff).

   First, tokens are best handled as small integers or enumerated
types, while symbol tables are commonly hashed. Other than converting
strings-to-int-tokens and symbols-to-hash-values, very little string
processing is done. Second, take a look at an assembler or compiler,
and you'll be amazed at the total lack of string operations. (Apart
from the lexical front-ends, of course.)

 > Text processors usually don't have much data that isn't part of one
 > string or another.

   Granted, but then for most text processors, a simple string or any
other sequence isn't enough to store the text and all relevant
information. A couple of years ago I wrote a type-setting system - it
should qualify as a "text processor" as good as any. The first thing
done with the incoming text was chopping it up in segments containing
font-pitch-kerning-etc-info unique to the segment. The actual
characters of the segment weren't used again until it was time to
print them. _All_ work was performed on the remaining segment
information, the lists of segments, and lists of lists of segments. Of
all hairy things done, _none_ involved character data. (And rarely any
duplication either, for that matter.)

   Let's distinguish between "defined," and "portable." Even if a
program adheres to a formal definition, there is no guarantee that
it's going to run on every other system that adheres to the same
definition. In the end, common sense and portability constraints will
have to lead all development.

							-- Jan Brittenson
							   bson@ai.mit.edu