Path: utzoo!mnetor!uunet!lll-winken!lll-lcc!ames!umd5!purdue!i.cc.purdue.edu!k.cc.purdue.edu!l.cc.purdue.edu!cik
From: cik@l.cc.purdue.edu (Herman Rubin)
Newsgroups: comp.lang.misc
Subject: Re: Languages and learning (was: Philosophy of C)
Message-ID: <670@l.cc.purdue.edu>
Date: 31 Jan 88 19:07:11 GMT
References: <11348@brl-adm.ARPA> <3473@ihlpf.ATT.COM> <3487@ihlpf.ATT.COM> <3555@ihlpf.ATT.COM>
Organization: Purdue University Statistics Department
Lines: 169
Summary: Uses of type in what I want the assembler to be

In article <3555@ihlpf.ATT.COM>, nevin1@ihlpf.ATT.COM (00704a-Liber) writes:
> In article <666@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
> >Nevin Liber said:
......
> >Macros of the form 
> >	X = Y + Z
> >and
> >	Q,R = M / N
> >should be added.  Start with what the hardware can do and proceed accordingly.
> 
> In order to add things like data structures and weak typing, you have to either
> build a more complicated architecture for a CPU and consequently add many more
> instructions to the machine language (what do the RISC folks think about that?)
> or restrict the assembly language so that it cannot be programmed to do
> everything possible on that machine.  The latter should not be called assembly
> language.

I am in no way advocating restricting the assembly language on any machine.  
There is nothing in my postings to indicate that I favor RISC; I have stated
that computer hardware should be versatile, and I consider the present CISC
machines as RISCy.

Most assembly languages are prenex, that is, the instruction comes first.  This
is what makes assembly difficult to read and write.  The assembler language,
COMPASS, on the CDC6x00 and the related CYBERs, is slightly prenex in that
the class of the instruction (integer, floating, "double", Boolean, increment,
etc.) is prenex, and the rest is mainly infix, although "," separators are
sometimes used.  I have used this language extensively.  The assembly
language on the CRAYs, CAL, uses very little of the prenex form, but 
requires that the class of the operator be included.  It is also rather
artificial in many places.  

> Typing is almost meaningless in assembly.  What is the difference to the
> machine between a 4-byte float and a 4-byte int?  Nothing.  It is what we, the
> users of the machine think of those 4-byte values.

Here is the place that the advocates of the present clumsy assemblers fail to
see the possibilities.  A given location, either register or memory, is a
collection of bits.  How the collection will be used is, indeed, up to the
programmer.  Now in HLLs we _declare_ the "type" assigned to that location.
Some languages, unfortunately, require that the type be inviolable; this is
generally called strong typing.  Some languages provide inadequate means of
allowing the programmer to change the type of something in memory; it is
extremely difficult for the user to change the type of something in registers.

What I am proposing is that an argument, either operand or result,  recognizable
by machine operations be typable by the user.  Then if the types of the
arguments are compatible with a version of the operator, that version is
used.  Thus on the VAX there are numerous move and convert instruction;
most of them would become

	X = Y

In some there can be a modification of the instruction, such as complementing
or negating, or restrictions can be put on sign extension, etc.  This would be
done by modifying the "=".  Only if we wish to perform a version of the move
which is not appropriate for the types of X and Y would the type symbols be
affixed to the "=" or its augmentation.  This would then become an explicit
unoverloading of the operation, while permitting normal overloading.  Notice
that typing would, in general, be required of all arguments.  Looking at the
VAX instructions with this in mind, I found 16 types, some of which may be
represented by disconnected pieces of storage (register or memory).

Clearly the programmer must be able to change the type of an entity at will.
This is not the cast operation in C; it is rather a _use_ operation.  The
meaning is either that the old type is no longer in use, which can with
difficulty be implemented in the present HLLs, or else it can be a statement
to the system, "I know you do not understand what I am doing, but remember
that you are supposed to do what I tell you."
 
> Also, adding things like infix macros detract from the notion of what is really
> going on in the hardware (which is the reason this whole subject was brought up
> in the first place).  There are languages around that satisfy your criterion (I
> think B, the predecessor to C, is one of them, but I'm not sure); if you think
> one of those languages would be good as a first language, then state your case.

> But please don't try to muck up assembly language; it is fine as it is for what
> people usually program in it for.

I am not "mucking it up," unless you are committed to operation-space-
comma separated argument list.  I have already pointed out that there are
assemblers not in this form.  If you append the type symbols to the operation
symbols, this is essentially the CAL syntax.  I see no reason why someone
programming with the full set of machine instructions need be perturbed by
infix notation, already used by some assemblers, and overloaded operators.
I would object to not allowing the programmer to include the type symbols,
as this would run into the weaknesses of strongly typed languages.

I admit it would be difficult for a _disassembler_ to have any of the
flexibility I would like--it might even be desirable for a disassembler
to _always_ specify the types.

> >> One of the major habits is un-structured programming.  
> >
> >I can give you examples where GOTOs are the simple thing to do, and the
> >"structured" alternatives are much more complex.  Structured programming
> >can block thinking of the efficient way to do things.
> 
> Please post some examples of problems (not solutions that are already
> implemented using GOTOs) where structured alternatives are 'much more' complex.

Of course, we can always replace a GOTO by "if TRUE then ..."  Notice the word
_efficient_ in my previous posting.  Suppose (and this is an actual case) we
have an algorithm which has an integer state, and there are stacks, which we
can guarantee do not become empty within the procedure if the integer state is
small enough.  Now we can do things by the usual case statements, but it will
drastically slow down the algorithm.  Until the integer state is reduced to 1,
processing occurs; should we put surplus material back on the appropriate
stack(s), store the new value of the integer, and go through the case procedure
at each stage, or should we use GOTOs, not even store the value of the integer
unless it is large (a rare occurrence), and keep our surplus material at hand
by going to an intermediate stage of the code for the new integer?  If you tell
me to get around this by using a better algorithm, I now inform you that the
algorithm uses on the average 4.4 _bits_ of information before proceding to
the store stage, and this includes getting the original integer state.  This
is not optimal, but it cannot be improved much by any practical coding scheme.

I know of no hardware which does not internally use GOTOs.  There are hardware 
implementations of conditional transfers on some machines which will make a
simple test in the transfer instruction, but in nanocode this is: make the
test; then transfer if the test succeeds (fails).  In other words, the machine
must have an internal GOTO.  If we want to understand the machine, we must,
therefore, have GOTO.

> IF many of these problems exist, I would like to discuss how we can 'fix' the
> structured languages to handle them more easily.
> 
> >> Also, machine language allows self-modifying code, which is a no-no by today's
> >> standards.  
> >
> >used.  If self-modifying code is the thing to do, do it!
> 
> This is one of the differences between hacking and programming :-)!
> 
> Many operating systems will not allow you to modify code of a running system
> (for example: SVR3 Unix).  Self-modifying, as well as non-reentrant code cannot
> be shared; ie, if sh were not reentrant then every time you fork a new process
> you would have to make a new copy of the entire executable for sh; our UNIX
> machines would run out of memory (real and virtual) VERY quickly.

There are obviously situations in which code cannot be self-modifying.  There 
are conditions in which certain loops or stack structures cannot be used if
certain parts of the machine state are being shared.  That does not mean that
self-modifying code should never be used.

> >> Machine language is NOT the place to start learning about computers or to
> >> program them.  HLLs have too many advantages over machine languages.  
> >
> >From the discussions I have seen, starting with machine languages _may_ be
> >more difficult.  However, I believe that for each person who has problems
> >starting with even the bad machines languages, dozens who start with the
> >HLLs will never be able to understand how to program anything that their
> >HLL does not support.
> 
> This would be true IF these people only learn one language.  I did not propose
> that they never learn assembler, just that it should not be the FIRST one they
> are taught.  Programmers should learn assembly language sometime; but only after
> they have developed some good programming techniques.  It is much harder to
> 'unteach' somebody than it is to teach somebody.

This is exactly why teaching HLL first is bad.  The student gets the impression
that the machine behaves like the HLL.  Only a badly designed machine has an
if...then...else structure.  Machines have operations of varying complexity;
the student who gets the impression that the structure of the machine looks
like the HLL certainly has a lot to unlearn!

-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (ARPA or UUCP) or hrubin@purccvm.bitnet