Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!cs.utexas.edu!husc6!m2c!wpi!jhallen From: jhallen@wpi.wpi.edu (Joseph H Allen) Newsgroups: comp.lang.misc Subject: Re: Anyone want to design a language? Message-ID: <8583@wpi.wpi.edu> Date: 19 Feb 90 11:54:38 GMT References: <22569:05:10:24@stealth.acf.nyu.edu> <8475@wpi.wpi.edu> <4489:05:14:19@stealth.acf.nyu.edu> Reply-To: jhallen@wpi.wpi.edu (Joseph H Allen) Distribution: usa Organization: Worcester Polytechnic Institute, Worcester ,MA Lines: 300 In article <4489:05:14:19@stealth.acf.nyu.edu> brnstnd@stealth.acf.nyu.edu (Dan Bernstein) writes: >Syntax is less important than semantics, though of course a clean, >simple syntax is necessary for a language programmers actually like. >(ALPAL: A Language Programmers Actually Like. Naaah, too pretentious.) Pretentious? How about D? >For the moment, general principles are more important than specifics. >There should be some number of macro (preprocessing) levels to handle >trivial syntactic issues. I don't know what system would be best, or >if there even is a best system. I think you hint at this later, but I think it should be just as easy to extend/add control statements as it is to extend/add functions. Perhaps some macro processing stage which is more heavily interwoven with the language is needed for this. I.E., a macro system in which you can say, "I want an expression here", "I want this symbol here" etc. >In article <8475@wpi.wpi.edu> jhallen@wpi.wpi.edu (Joseph H Allen) writes: > [ lots of suggestions ] > >1, 2, 3. No semicolons. End-of-line comments. Block structure indicated > by indentation. > >These all relate to the syntax of simple statements and control >structures. The most important general issue is whether structures >should be explicitly terminated. The only advantage of C-ish failure to >terminate is that single-statement structures are slightly shorter; and >there are lots of syntactic disadvantages. Is there anyone out there who >really wouldn't like loop ... end/endloop/pool, etc.? >You propose letting indentation determine structure, and using newlines >as statement terminators. I didn't mean newlines to be statement terminators. If a statment needs to go into an another line, that's fine. Statements should be terminated implicitly when they can no longer be parsed. This means we have to be very careful about not having identifiers which can both be operators and variables. Also infix must not be shared with prefix or postfix operators. One problem we will have is with '-'. When you see: it = this + 5 - 10 Does it mean it=this+5-10? or is the -10 a single return value for the block? I prepose we let this problem stand and solve it with parenthasis: it = this + 5 ( - 10) However, new lines could be used to terminate multi-statement lines (the single statement problem you talked about): If the statement (expression. No reason to distinguish between the two) starts after the if expression, then it's a single line statement. if expr expr expr expr expr \n If the statement doesn't start after the if expression then it's a multi-line block: if expr expr expr expr expr expr which ends when the indentation level becomes lower. > It's easy to convert between this and a more >traditional syntax; in fact, it would be nice to have a macro facility >good enough to do the job. Lets not cop out too early... >Anyway, I favor a syntax that doesn't depend >on lines or indentation: otherwise it's too easy to make syntax errors. I disagree with this. It's more of a pain when the indentationing doesn't match the block symbols: if dfjhkjddf {{{{{{{ sdfjkhdf }}}} else {{{{{ }}}}}} What people do with C makes things very confusing. ( :-) YOU use the macro processor to make it your way. The language defualt will, of course be my way.) >A line-based syntax also feels very dirty: there are exceptions for >multiple statements on a line, exceptions for single-statement >structures, etc. It's absolutely consistent. Only two rules are needed. Deeper indentation means a new block and when the body statement begins on the same line as the structure statement a single line block is indicated (oh, the end of line terminator shouldn't be "hard". Instead all statements beginning on the same line are part of block. The last statement should be able to continue onto the next line if it has to: if a==b a=d+ ; + means has to continue e ; onto this line ) >4. Overloadable and definable operators >I think overloading should be just kept in mind until >function calls and any object-oriented facilities are worked out. Overloading is too convenient not to have built into the language at every level. All of the language intrisics should be as unambiguous as possible. However, it will be possible for the user to screw up with definable operators. I think this is a style issure- don't overload unless you absolutely have to. >5. All characters allowed in symbols. > >Would you really want to read a program with ?)*[! as an identifier? Yes. And spaces should be allowed in symbols too (I hate those stupid _) >I wouldn't mind a macro facility that could handle this, or the ability >to partition the character set the way you want. Sure take it out of the language why don't you. >However, the basic >language must have some namespace control to do any parsing at all. No it doesn't. Operators and other symbols are all disginguished by what's in the symbol table not by what characters they use. The LEXer only finds words in the symbol table and passes this on to the parser. The LEXer doesn't do anything else (except constants (if you're a real purist, put these in the symbol table too- all of them :)) >Also, this language MUST be interoperable with other languages to be >useful. This and the fact that you can make some very instersting unambiguities are the downfall of this. I think the language shouldn't be restrictive. People should just excercise self control. Which would you be more annoyed at? The language not letting you use '$' in symbols so that you couldn't access VAX's special assembler symbols or using IBM's graphic characters and then discover that doing so isn't very portable? >The issue of defining your own character set relates strongly to the >syntactic argument about overloading. Never force a reader to learn a >new language. I don't want to start a war here but I'm more for writing then reading and maintaining. Let the managers force rules on the programmers to make things maintainable. >8. Eliminate arrays in favor of pointers and macros. > >Say what? You need some way to express the concept of a contiguous >region of memory. No, no, no. This should be an initializer issue: inline # int array = # 256 dup int Left side: non addressable pointer to integers (an equate). Right side: The address of 256 uninitialized ints. >That's what arrays are for. How do pointers cleanly >express multidimensional arrays? The language should know something >about arrays, even if just for efficiency. What are you some kind of math person :) ? System langauges don't need arrays. C doesn't even really have arrays.. there's no way of passing mutidimension arrays without seperately passing the size of each dimension. Efficiency is an other problem, however. >9. Constants: $hex, decimal, %binary, 'c', 'abc' > >This is again a matter of taste; No it's not. It's just plain stupid to do hex contants with 0x... or 0...h (the C and Intel way). >I don't agree with the combined syntax for strings and >characters: what do you do with single-character strings? The language absolutely must do this. I find it very annoying that there are things I can do in assembly language strings that I can't do with C's (namely, have constant expressions in each character). Single character strings? No problem: string = # 'A' string = # 65 string = # 'ABCDEFG' string = # 65 \ 66 \ 67 \ 68 \ 'EFG' Admittedly, having to put a '#' before each string to get its address is a pain. > The language >shouldn't have to know about strings; Pascal and Ada deal with strings >poorly. (C's problem is that there isn't a good enough syntax to easily >interface the language with different string-storage techniques.) I agree. C's problem is solved with macros and overloadable operators. > I also >disagree with the idea of leaving out octal: finding a better syntax is >a good idea but there's no reason to take the feature away. Ok. Lets make it the braindamaged type. Octal numbers end with 'O' (oh). (actually I'm kidding. I know we have to support octal. Perhaps there should even have base-n constants) >10. Standard operators. >As for =/== vs. :=/= vs. your :=/== vs. statements-ain't-expressions =/= >vs. =/.EQ. vs. ... : I dunno. When I'm coding on paper I alternate >between paper-only left-arrow/= and C's =/==. On the screen I've begun >using preprocessors that can handle my terminal's extended characters. >As many writers have observed, the problem is balancing paper tradition >with ASCII's rather inexpressive character set. Right. Allow all characers to be used in symbols. >You mention that structures can disappear in favor of typedef and blocks. >To me it doesn't look like you're simplifying anything; and it's a bad >idea to confuse statement blocks with structure blocks. What's the difference between this and what C does? I just don't see the need for C's 'struct' keyword. Every structure I ever make begins like this: typedef struct foo FOO; struct foo { FOO *next; etc... }; >Unions can and should be reorderable. union { int a; float b; } and >union { float b; int a; } must, of course, be compatible---except that >in C they'd be initialized differently. (I hope you don't mind unions?) Frankly I wish there were some easier way to deal with unions. As far as I'm concerned, unions are just as difficult to use as casts: x->thing.memeber=7 (union) (cast)x->thing=7 (cast) Perhaps we should have overloadable variables? >16. Basic types: int bits, uint bits > >I disagree. The basic types should be those types that the machine can >handle quickly. The language must be efficient! It's perfectly fine to >have a standard notation for ``a type long enough to handle N bits'' or >``how many bits are in type X?'' but the language should not make >restrictions on the size of basic types. This isn't making any restrictions. The only types provided are the machine primitive ones (char short long etc..) this just provides a way of selecting the proper one for the machine being used. >nobody feels the same way about the first. Then again, wouldn't a >standard notation for your ``int 8'' be enough?) Yes, use a header file. >What about characters? What about floating-point types, which many >machines support better than ints? What about Ada-like fixed-point >types? Lots more to do... >ANSI C messed up in its restrictions on void. void should mean a 0-bit >integer, aligned so that any pointer type can be safely converted back >and forth to void *. So dereferencing a void always produces 0; >sizeof(void) is 0; and so on. Yes perhaps there should both be 'void' and 'unspecified'. There might be some way to combine this with variable arguments. >[only allow bit packing in structures] I don't know. I think if the machine can handle chars and ints and also has alignment problems char packing isn't that bad. The only time there is a problem with this is when you try to increment a pointer from a char to an int. Inside of structures this isn't a problem because I want to provide a special 'sizeof' like operator: 'base' return the distance between the structure base address and one of its members. >19. User-defined precedence. >This is yet another syntactic preprocessing problem. Remember that the >language should be readable! This is really just part of definable operators. -- "Come on Duke, lets do those crimes" - Debbie "Yeah... Yeah, lets go get sushi... and not pay" - Duke