Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!cs.utexas.edu!uunet!stealth.acf.nyu.edu!brnstnd From: brnstnd@stealth.acf.nyu.edu Newsgroups: comp.lang.misc Subject: Re: Anyone want to design a language? Message-ID: <4489:05:14:19@stealth.acf.nyu.edu> Date: 19 Feb 90 05:14:20 GMT References: <22569:05:10:24@stealth.acf.nyu.edu> <8475@wpi.wpi.edu> Reply-To: brnstnd@stealth.acf.nyu.edu (Dan Bernstein) Distribution: usa Organization: IR Lines: 253 Syntax is less important than semantics, though of course a clean, simple syntax is necessary for a language programmers actually like. (ALPAL: A Language Programmers Actually Like. Naaah, too pretentious.) For the moment, general principles are more important than specifics. There should be some number of macro (preprocessing) levels to handle trivial syntactic issues. I don't know what system would be best, or if there even is a best system. In article <8475@wpi.wpi.edu> jhallen@wpi.wpi.edu (Joseph H Allen) writes: [ lots of suggestions ] 1, 2, 3. No semicolons. End-of-line comments. Block structure indicated by indentation. These all relate to the syntax of simple statements and control structures. The most important general issue is whether structures should be explicitly terminated. The only advantage of C-ish failure to terminate is that single-statement structures are slightly shorter; and there are lots of syntactic disadvantages. Is there anyone out there who really wouldn't like loop ... end/endloop/pool, etc.? You propose letting indentation determine structure, and using newlines as statement terminators. It's easy to convert between this and a more traditional syntax; in fact, it would be nice to have a macro facility good enough to do the job. Anyway, I favor a syntax that doesn't depend on lines or indentation: otherwise it's too easy to make syntax errors. A line-based syntax also feels very dirty: there are exceptions for multiple statements on a line, exceptions for single-statement structures, etc. 4. Overloadable and definable operators This is another syntax issue. The language MUST provide an unambiguous syntax for everything. Fortran-90 is the only overloading language I know that does this well. Overloading just means ambiguous abbreviation, and definable operators are just a more convenient syntax for certain functions. I think overloading should be just kept in mind until function calls and any object-oriented facilities are worked out. 5. All characters allowed in symbols. Would you really want to read a program with ?)*[! as an identifier? I wouldn't mind a macro facility that could handle this, or the ability to partition the character set the way you want. However, the basic language must have some namespace control to do any parsing at all. Also, this language MUST be interoperable with other languages to be useful. The issue of defining your own character set relates strongly to the syntactic argument about overloading. Never force a reader to learn a new language. 6. C-like initialization power. Well, okay. Take it for granted that declarations and definitions will be at least as powerful as in C. 7. int **foo[] becoming [] * * int foo Yeah. C would be cleaner if all the ``type constructants'' had a single syntax. This needs to be considered in much more detail to see what people would like to use. Perhaps there's a simple, readable, consistent way to provide everything in both prefix and postfix form; then nobody can complain. 8. Eliminate arrays in favor of pointers and macros. Say what? You need some way to express the concept of a contiguous region of memory. That's what arrays are for. How do pointers cleanly express multidimensional arrays? The language should know something about arrays, even if just for efficiency. 9. Constants: $hex, decimal, %binary, 'c', 'abc' This is again a matter of taste; we'll see what people like. Many different forms of constants can be provided without hurting simplicity or readability. I don't agree with the combined syntax for strings and characters: what do you do with single-character strings? The language shouldn't have to know about strings; Pascal and Ada deal with strings poorly. (C's problem is that there isn't a good enough syntax to easily interface the language with different string-storage techniques.) I also disagree with the idea of leaving out octal: finding a better syntax is a good idea but there's no reason to take the feature away. 10. Standard operators. This is, again, something that must be considered in much greater detail to get right. (Yes, I agree that @ is a much more logical symbol than * for indirection.) For the moment let's stick to general issues: You're right that there should be Algol 68C-like assignments that relate to a = b and a op= b the same way that a++ relates to ++a. As for =/== vs. :=/= vs. your :=/== vs. statements-ain't-expressions =/= vs. =/.EQ. vs. ... : I dunno. When I'm coding on paper I alternate between paper-only left-arrow/= and C's =/==. On the screen I've begun using preprocessors that can handle my terminal's extended characters. As many writers have observed, the problem is balancing paper tradition with ASCII's rather inexpressive character set. 11, 12. BCPL-like statements returning values. Yes, of course. C's restriction that you can't do something like a = {if (b == c) 2; else 3; } is purely annoying. At the very least, the language should solve this the way that GNU's C compiler does. 13. Declarations anywhere. Yeah. 14. Control flow statements, control structures: [ various ] I have some rather heretical thoughts on this subject. I'll make them clear in another message. (Remember that this isn't Ada. Given an infinite loop ... endloop, if, and break, you don't need to provide a terminating loop as a basic construct. Define it instead as a standard macro. Ada's infinite variety of control structures is awful.) 15. Structure and code generation rules: Variables are in memory in the order of declaration. Yeah. I very much want more control over stack allocation and control flow than in C. This is not dealt with by any current language and needs a lot of thought. One idea I've been considering is replacing function types with statement types. This makes setjmp/longjmp, multiple function entry points, and various other techniques much cleaner. The problem is, once again, how and when to allocate stack variables. I think two goals along these lines are (simpler:) that the language support varargs (and varargs passing!) cleanly, and (harder, assuming both a good exception mechanism and OS-generated timer exceptions:) that the language support enough stack control and longjmp control that a programmer can build a portable threads library. Note that a truly working setjmp/longjmp would deal with register variables correctly; this is probably impossible without OS and hardware support for a ``register storage vector'' indicating storage locations for all register variables. It's certainly something to think about... You mention that structures can disappear in favor of typedef and blocks. To me it doesn't look like you're simplifying anything; and it's a bad idea to confuse statement blocks with structure blocks. Unions can and should be reorderable. union { int a; float b; } and union { float b; int a; } must, of course, be compatible---except that in C they'd be initialized differently. (I hope you don't mind unions?) 16. Basic types: int bits, uint bits I disagree. The basic types should be those types that the machine can handle quickly. The language must be efficient! It's perfectly fine to have a standard notation for ``a type long enough to handle N bits'' or ``how many bits are in type X?'' but the language should not make restrictions on the size of basic types. (Then again, every case in which portability takes second place to efficiency must be carefully considered and well documented. Two issues along these lines are bit sizes and the semantics of mod. As I feel very strongly that the second should be portable, I shouldn't assume that nobody feels the same way about the first. Then again, wouldn't a standard notation for your ``int 8'' be enough?) What about characters? What about floating-point types, which many machines support better than ints? What about Ada-like fixed-point types? ANSI C messed up in its restrictions on void. void should mean a 0-bit integer, aligned so that any pointer type can be safely converted back and forth to void *. So dereferencing a void always produces 0; sizeof(void) is 0; and so on. I agree with C's philosophy of only allowing bit packing inside structures. Other packing methods would really mangle the concept of pointers. 17. const, inline, register, macro, op LEFT RIGHT RETURN Interesting idea, the last one. The basic function call syntax should be what the most people like; if there's a clean way to integrate (say) C's functions, Forth's statements, and Lisp's whatevers, let's do it. It would be wonderful to have a way to express more complex data flow than algebraic expressions and single-type function calls. Unfortunately, I don't know any good syntax or semantics for data flow. (This is NOT going to become a so-called ``functional'' language, thank you.) Data flow is just a convenient way to express temporary (register) variables; whenever I use an expression twice I wonder if there's some natural ``teeing'' extension to C's ``piping'' notation that would simplify my code. 18. Automatic conversion. Yeah. Is it inconsistent how C really mangles the representation when you convert from int to float while it (typically) doesn't change it at all when you convert from int * to void *? I'm not sure. It may not be wise to integrate casts with user-defined conversion functions, as the former are implementation-dependent while the latter should not be. 19. User-defined precedence. This is yet another syntactic preprocessing problem. Remember that the language should be readable! 20. Parameter passing: [ various ideas ] This has to be dealt with very carefully. I like C's solution: it's clean while allowing every trick Ada can do. A general principle here (which you appear to disagree with) is that the form of a function call can make clear the fact that a variable is not modified. 21. Classes equal structures. Inheritance is just including one structure in another. Function arguments are really structures. This is the kind of idea that I'm looking for. Object-oriented programming can be very clean given a sufficiently powerful syntax and semantics for function pointers and structures. Your ``inherit'' keyword is beautiful. Function arguments being structures: This could be useful if it's combined with a simple way to deal with the program stack. Default structure values upon creation: This brings up the issue of whether there should be a way to call an initialization function the first time a function is called (as in Modula-2, Fortran-90, and a few other obsolete [1/2 :-)] languages). I don't think there's any point: all related ``features'' can be much more cleanly implemented by combining function pointers with the more usual initializations, or by keeping an appropriate local variable. (Those are the two methods used in Modula-2 compilers: the point is that if they're easily implemented with simpler features, they should be. Modularity.) > - Member functions are indicated in function declarations. > There should be another type qualifyer which indicates a > function gets a pointer to the structure and all members of > that structure look like local variables to the function. I'm not sure what you're getting at here. 22. Named arguments. Yeah. This is one of Ada's few good features. The syntax is a bit of a problem, but I'm sure it can be worked out. > There's much, much more to do No duh. > The general goal > is to make it both one step above assembly language and completely extendable. And modular. And clean. And robust. And likable, even fun to use! ---Dan