Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!iuvax!pur-ee!mentor.cc.purdue.edu!l.cc.purdue.edu!cik From: cik@l.cc.purdue.edu (Herman Rubin) Newsgroups: comp.lang.misc Subject: Re: Anyone want to design a language? Message-ID: <1944@l.cc.purdue.edu> Date: 19 Feb 90 21:31:20 GMT References: <22569:05:10:24@stealth.acf.nyu.edu> <8475@wpi.wpi.edu> <4489:05:14:19@stealth.acf.nyu.edu> Distribution: usa Organization: Purdue University Statistics Department Lines: 314 In article <4489:05:14:19@stealth.acf.nyu.edu>, brnstnd@stealth.acf.nyu.edu writes: > Syntax is less important than semantics, though of course a clean, > simple syntax is necessary for a language programmers actually like. > (ALPAL: A Language Programmers Actually Like. Naaah, too pretentious.) What is a simple syntax? Simple for whom, the human or the machine? For example, most assembler languages, macro designs, etc., have simple syntax for the machine but not for the human. > For the moment, general principles are more important than specifics. > > There should be some number of macro (preprocessing) levels to handle > trivial syntactic issues. I don't know what system would be best, or > if there even is a best system. I find the lack of a versatile typed macro processor extremely inconvenient, and I would find one preferable to any existing language, even if no other tools were available. For example, x = y - z should be the (= -) macro (or some such designation, and it should allow for the types of the arguments. > In article <8475@wpi.wpi.edu> jhallen@wpi.wpi.edu (Joseph H Allen) writes: > [ lots of suggestions ] > > 1, 2, 3. No semicolons. End-of-line comments. Block structure indicated > by indentation. > > These all relate to the syntax of simple statements and control > structures. The most important general issue is whether structures > should be explicitly terminated. The only advantage of C-ish failure to > terminate is that single-statement structures are slightly shorter; and > there are lots of syntactic disadvantages. Is there anyone out there who > really wouldn't like loop ... end/endloop/pool, etc.? I believe that we should have semicolons, but that an end-of-line should terminate a statement unless a specific exception is made. This is one of the most common sources of errors in C programs, and is in any case a nuisance. I definitely do not like to have to use such clumsiness as typing unnecessary strings for the convenience of the compiler. I do not like endloop/pool. I also do not believe that indentation is necessarily the right method for block structure. For one thing, by the 10th block in, it is certainly a nuisance. A suggestion would be to allow arbitrary block labels, and have an end pseudoinstruction with multiple labels. This is especially important when aborting to an explicit earlier place. .................... > 4. Overloadable and definable operators > > This is another syntax issue. The language MUST provide an unambiguous > syntax for everything. Fortran-90 is the only overloading language I > know that does this well. Overloading just means ambiguous abbreviation, > and definable operators are just a more convenient syntax for certain > functions. NO NO NO! An operator is not a function, especially if it is different for arguments of different types, such as the sum, product, power operators, etc. Also, I see no more reason for a function call, or even function notation, for power than for sum. It is no more reasonable to require x = pow(y,z) than x = sum(y,z). > 5. All characters allowed in symbols. > > Would you really want to read a program with ?)*[! as an identifier? Only as a macro name (see above), which the macro being more in the form of x ? y )* z [n! or something similar. > I wouldn't mind a macro facility that could handle this, or the ability > to partition the character set the way you want. However, the basic > language must have some namespace control to do any parsing at all. > Also, this language MUST be interoperable with other languages to be > useful. This means that global names must not be changed by the compiler. It is a real nuisance that the function sin in C becomes _sin to the loader, and that erf in Fortran becomes _erf_. When writing a program, I should not have to know from which language the subroutine library got the subroutines used, nor should I have to replicate subroutine libraries because of this. It is definitely the case that one may want to use subroutines from different sources, and this requires that names be unchanged by the compiler. This even applies if blocks are used across subroutines. > The issue of defining your own character set relates strongly to the > syntactic argument about overloading. Never force a reader to learn a > new language. This may be necessary. My basic operations are frequently so clumsy to duplicate in the existing languages that it is necessary to do otherwise. This includes the introduction of operator symbols and strings. For example, suppose I want to unpack floating point numbers into their exponents and mantissas. I do not want to have to try to do this with the debilities of languages like C. > 6. C-like initialization power. > > Well, okay. Take it for granted that declarations and definitions will > be at least as powerful as in C. > > 7. int **foo[] becoming [] * * int foo Or even better @ @ int foo. This is an unnecessay overloading of *, done because early UNIX had @ as the line kill character. > Yeah. C would be cleaner if all the ``type constructants'' had a single > syntax. This needs to be considered in much more detail to see what > people would like to use. Perhaps there's a simple, readable, consistent > way to provide everything in both prefix and postfix form; then nobody > can complain. > > 8. Eliminate arrays in favor of pointers and macros. > > Say what? You need some way to express the concept of a contiguous > region of memory. That's what arrays are for. How do pointers cleanly > express multidimensional arrays? The language should know something > about arrays, even if just for efficiency. I agree. this is one of the great lacks in C. > 9. Constants: $hex, decimal, %binary, 'c', 'abc' > > This is again a matter of taste; we'll see what people like. Many > different forms of constants can be provided without hurting simplicity > or readability. I don't agree with the combined syntax for strings and > characters: what do you do with single-character strings? The language > shouldn't have to know about strings; Pascal and Ada deal with strings > poorly. (C's problem is that there isn't a good enough syntax to easily > interface the language with different string-storage techniques.) I also > disagree with the idea of leaving out octal: finding a better syntax is > a good idea but there's no reason to take the feature away. Here nothing should be left out. There is a great need for floating point numbers not in decimal, at least octal or hex for the mantissa and exponent, but a base 2 exponent. > 10. Standard operators. > > This is, again, something that must be considered in much greater > detail to get right. (Yes, I agree that @ is a much more logical symbol > than * for indirection.) For the moment let's stick to general issues: > You're right that there should be Algol 68C-like assignments that relate > to a = b and a op= b the same way that a++ relates to ++a. The use of ++ and -- is another example which leads to problems. I have no problem with op=, but using bad symbols because you did not think of anything better is at least highly debatable. There is also the systematic use of symbols in C which conflict with long-standing notation. ASCII is not enough in any case. > As for =/== vs. :=/= vs. your :=/== vs. statements-ain't-expressions =/= > vs. =/.EQ. vs. ... : I dunno. When I'm coding on paper I alternate > between paper-only left-arrow/= and C's =/==. On the screen I've begun > using preprocessors that can handle my terminal's extended characters. > As many writers have observed, the problem is balancing paper tradition > with ASCII's rather inexpressive character set. ........................ > 13. Declarations anywhere. > > Yeah. > > 14. Control flow statements, control structures: [ various ] > > I have some rather heretical thoughts on this subject. I'll make them > clear in another message. (Remember that this isn't Ada. Given an > infinite loop ... endloop, if, and break, you don't need to provide > a terminating loop as a basic construct. Define it instead as a standard > macro. Ada's infinite variety of control structures is awful.) Mine are even more heretical. I insist on goto, and frequently terminate a loop by jumping out of it. Spaghetti algorithms call for spaghetti code, and I have lots of them. Structured programming can cause huge inefficien- cies, as well as being harder to understand. > 15. Structure and code generation rules: Variables are in memory in the > order of declaration. > > Yeah. I very much want more control over stack allocation and control > flow than in C. This is not dealt with by any current language and needs > a lot of thought. One idea I've been considering is replacing function > types with statement types. This makes setjmp/longjmp, multiple function > entry points, and various other techniques much cleaner. The problem is, > once again, how and when to allocate stack variables. DO NOT INSIST ON PASSING ARGUMENTS WITH STACKS. Register arguments are frequently better, and there are numerous other ways, such as argument arrays. There are situations where stacks are the way to do it, but memory references in general should be avoided where possible. ....................... > 16. Basic types: int bits, uint bits > > I disagree. The basic types should be those types that the machine can > handle quickly. The language must be efficient! It's perfectly fine to > have a standard notation for ``a type long enough to handle N bits'' or > ``how many bits are in type X?'' but the language should not make > restrictions on the size of basic types. A type need not even consist of adjacent elements. If a string requires a beginning address and a length, the pair is the designator of the string. It may or may not be desirable to have the indices in adjacent memory locations. An array descriptor would have the location of the 0,0, ..., 0 element, the dimensions, and if necessary the storage locations; these need not be adjacent, and some of this information can be shared. > (Then again, every case in which portability takes second place to > efficiency must be carefully considered and well documented. Two issues > along these lines are bit sizes and the semantics of mod. As I feel very > strongly that the second should be portable, I shouldn't assume that > nobody feels the same way about the first. Then again, wouldn't a > standard notation for your ``int 8'' be enough?) > > What about characters? What about floating-point types, which many > machines support better than ints? What about Ada-like fixed-point > types? It is almost impossible to get full portability on anything other than integer arithmetic, and even here there are problems. ....................... > 19. User-defined precedence. > > This is yet another syntactic preprocessing problem. Remember that the > language should be readable! I suspect that much of the problems are with precedence. I am not sure that we would not be better off without trying to make it rigid. Some of the precedences in C are gotten wrong by just about everybody. We could possibly used numbered parentheses to get around it. > 20. Parameter passing: [ various ideas ] > > This has to be dealt with very carefully. I like C's solution: it's clean > while allowing every trick Ada can do. A general principle here (which > you appear to disagree with) is that the form of a function call can > make clear the fact that a variable is not modified. But it is clumsy and slow. Now if we had a decent notation, so a function could return a list of results (NOT a struct). we could get around this. But trying to keep functions from having side effects is a losing proposition. > 21. Classes equal structures. Inheritance is just including one structure > in another. Function arguments are really structures. Classes equal typedefs. Structures are needed for more complicated situations. DO NOT insist on function arguments being structures; more time can be wasted by forming the structure than by computing the function. A list of values is more general than a structure by far, and the list need not be in consecutive locations in any way. ...................... > > The general goal > > is to make it both one step above assembly language and completely extendable. > > And modular. And clean. And robust. And likable, even fun to use! All of this can be done, but not portable. Good code can usually at most be semi-portable. ------------------------------------------------------------------------------ Another point that I wish to address is what I call the usurpation of notation. There are many somewhat standard uses of symbols in mathematics which are used in languages such as C for totally different meanings. The two most flagrant ones here are | and ^. I know of several uses of | in mathematics, none of which is "or". The notation by Backus was a long vertical, and it was used in the sense of || in C. The most common use of vertical lines in mathematics is for absolute value, and I believe it should have this meaning in programming as an overloaded operator. The use in mathematics extends for well over 100 years. There are even other uses of ^ in CS before C, none of which was exclusive or. This taking of symbols and defining their use to be something else because the inventors of the language are not sufficiently knowledgeable is a bad idea. Fortran designers avoided this and made it clear that they used * and ** because these were not already used for something else, and what they wanted to use was not available. I suggest that we make every effort to avoid using symbols which have other meanings. Whenever mathematical notation disagrees with that of applications, it is usually the mathematics that got there first. Also, make sure that the language is not restricted so that only an idiot can stand it. I believe it is possible to produce a language in which good programming can be done. Something to keep in mind is that people will find ways to do things that you have not thought of. So it is necessary to allow computer objects to be used as bit strings, to allow bitwise operations on floating point numbers, to allow the use of a number as something other than the language intended. Furthermore, the programmer may know things the compiler can use like frequency, etc. The programmer may have a good reason for keeping something in a register, or even insisting it be stored, which it is hard for the compiler to figure out. I have a natural example of a recursive program in which several registers should be kept across recursions. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)