Path: utzoo!mnetor!uunet!lll-winken!lll-lcc!ames!umd5!purdue!i.cc.purdue.edu!k.cc.purdue.edu!l.cc.purdue.edu!cik From: cik@l.cc.purdue.edu (Herman Rubin) Newsgroups: comp.lang.misc Subject: Re: Languages and learning (was: Philosophy of C) Message-ID: <670@l.cc.purdue.edu> Date: 31 Jan 88 19:07:11 GMT References: <11348@brl-adm.ARPA> <3473@ihlpf.ATT.COM> <3487@ihlpf.ATT.COM> <3555@ihlpf.ATT.COM> Organization: Purdue University Statistics Department Lines: 169 Summary: Uses of type in what I want the assembler to be In article <3555@ihlpf.ATT.COM>, nevin1@ihlpf.ATT.COM (00704a-Liber) writes: > In article <666@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes: > >Nevin Liber said: ...... > >Macros of the form > > X = Y + Z > >and > > Q,R = M / N > >should be added. Start with what the hardware can do and proceed accordingly. > > In order to add things like data structures and weak typing, you have to either > build a more complicated architecture for a CPU and consequently add many more > instructions to the machine language (what do the RISC folks think about that?) > or restrict the assembly language so that it cannot be programmed to do > everything possible on that machine. The latter should not be called assembly > language. I am in no way advocating restricting the assembly language on any machine. There is nothing in my postings to indicate that I favor RISC; I have stated that computer hardware should be versatile, and I consider the present CISC machines as RISCy. Most assembly languages are prenex, that is, the instruction comes first. This is what makes assembly difficult to read and write. The assembler language, COMPASS, on the CDC6x00 and the related CYBERs, is slightly prenex in that the class of the instruction (integer, floating, "double", Boolean, increment, etc.) is prenex, and the rest is mainly infix, although "," separators are sometimes used. I have used this language extensively. The assembly language on the CRAYs, CAL, uses very little of the prenex form, but requires that the class of the operator be included. It is also rather artificial in many places. > Typing is almost meaningless in assembly. What is the difference to the > machine between a 4-byte float and a 4-byte int? Nothing. It is what we, the > users of the machine think of those 4-byte values. Here is the place that the advocates of the present clumsy assemblers fail to see the possibilities. A given location, either register or memory, is a collection of bits. How the collection will be used is, indeed, up to the programmer. Now in HLLs we _declare_ the "type" assigned to that location. Some languages, unfortunately, require that the type be inviolable; this is generally called strong typing. Some languages provide inadequate means of allowing the programmer to change the type of something in memory; it is extremely difficult for the user to change the type of something in registers. What I am proposing is that an argument, either operand or result, recognizable by machine operations be typable by the user. Then if the types of the arguments are compatible with a version of the operator, that version is used. Thus on the VAX there are numerous move and convert instruction; most of them would become X = Y In some there can be a modification of the instruction, such as complementing or negating, or restrictions can be put on sign extension, etc. This would be done by modifying the "=". Only if we wish to perform a version of the move which is not appropriate for the types of X and Y would the type symbols be affixed to the "=" or its augmentation. This would then become an explicit unoverloading of the operation, while permitting normal overloading. Notice that typing would, in general, be required of all arguments. Looking at the VAX instructions with this in mind, I found 16 types, some of which may be represented by disconnected pieces of storage (register or memory). Clearly the programmer must be able to change the type of an entity at will. This is not the cast operation in C; it is rather a _use_ operation. The meaning is either that the old type is no longer in use, which can with difficulty be implemented in the present HLLs, or else it can be a statement to the system, "I know you do not understand what I am doing, but remember that you are supposed to do what I tell you." > Also, adding things like infix macros detract from the notion of what is really > going on in the hardware (which is the reason this whole subject was brought up > in the first place). There are languages around that satisfy your criterion (I > think B, the predecessor to C, is one of them, but I'm not sure); if you think > one of those languages would be good as a first language, then state your case. > But please don't try to muck up assembly language; it is fine as it is for what > people usually program in it for. I am not "mucking it up," unless you are committed to operation-space- comma separated argument list. I have already pointed out that there are assemblers not in this form. If you append the type symbols to the operation symbols, this is essentially the CAL syntax. I see no reason why someone programming with the full set of machine instructions need be perturbed by infix notation, already used by some assemblers, and overloaded operators. I would object to not allowing the programmer to include the type symbols, as this would run into the weaknesses of strongly typed languages. I admit it would be difficult for a _disassembler_ to have any of the flexibility I would like--it might even be desirable for a disassembler to _always_ specify the types. > >> One of the major habits is un-structured programming. > > > >I can give you examples where GOTOs are the simple thing to do, and the > >"structured" alternatives are much more complex. Structured programming > >can block thinking of the efficient way to do things. > > Please post some examples of problems (not solutions that are already > implemented using GOTOs) where structured alternatives are 'much more' complex. Of course, we can always replace a GOTO by "if TRUE then ..." Notice the word _efficient_ in my previous posting. Suppose (and this is an actual case) we have an algorithm which has an integer state, and there are stacks, which we can guarantee do not become empty within the procedure if the integer state is small enough. Now we can do things by the usual case statements, but it will drastically slow down the algorithm. Until the integer state is reduced to 1, processing occurs; should we put surplus material back on the appropriate stack(s), store the new value of the integer, and go through the case procedure at each stage, or should we use GOTOs, not even store the value of the integer unless it is large (a rare occurrence), and keep our surplus material at hand by going to an intermediate stage of the code for the new integer? If you tell me to get around this by using a better algorithm, I now inform you that the algorithm uses on the average 4.4 _bits_ of information before proceding to the store stage, and this includes getting the original integer state. This is not optimal, but it cannot be improved much by any practical coding scheme. I know of no hardware which does not internally use GOTOs. There are hardware implementations of conditional transfers on some machines which will make a simple test in the transfer instruction, but in nanocode this is: make the test; then transfer if the test succeeds (fails). In other words, the machine must have an internal GOTO. If we want to understand the machine, we must, therefore, have GOTO. > IF many of these problems exist, I would like to discuss how we can 'fix' the > structured languages to handle them more easily. > > >> Also, machine language allows self-modifying code, which is a no-no by today's > >> standards. > > > >used. If self-modifying code is the thing to do, do it! > > This is one of the differences between hacking and programming :-)! > > Many operating systems will not allow you to modify code of a running system > (for example: SVR3 Unix). Self-modifying, as well as non-reentrant code cannot > be shared; ie, if sh were not reentrant then every time you fork a new process > you would have to make a new copy of the entire executable for sh; our UNIX > machines would run out of memory (real and virtual) VERY quickly. There are obviously situations in which code cannot be self-modifying. There are conditions in which certain loops or stack structures cannot be used if certain parts of the machine state are being shared. That does not mean that self-modifying code should never be used. > >> Machine language is NOT the place to start learning about computers or to > >> program them. HLLs have too many advantages over machine languages. > > > >From the discussions I have seen, starting with machine languages _may_ be > >more difficult. However, I believe that for each person who has problems > >starting with even the bad machines languages, dozens who start with the > >HLLs will never be able to understand how to program anything that their > >HLL does not support. > > This would be true IF these people only learn one language. I did not propose > that they never learn assembler, just that it should not be the FIRST one they > are taught. Programmers should learn assembly language sometime; but only after > they have developed some good programming techniques. It is much harder to > 'unteach' somebody than it is to teach somebody. This is exactly why teaching HLL first is bad. The student gets the impression that the machine behaves like the HLL. Only a badly designed machine has an if...then...else structure. Machines have operations of varying complexity; the student who gets the impression that the structure of the machine looks like the HLL certainly has a lot to unlearn! -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (ARPA or UUCP) or hrubin@purccvm.bitnet