Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!usc!zaphod.mps.ohio-state.edu!think!yale!cmcl2!lanl!lambda!jlg From: jlg@lambda.UUCP (Jim Giles) Newsgroups: comp.lang.misc Subject: Re: Anyone want to design a language? Message-ID: <14245@lambda.UUCP> Date: 22 Feb 90 22:08:30 GMT References: <390@argosy.UUCP> Distribution: usa Lines: 188 From article <390@argosy.UUCP>, by kevin@argosy.UUCP (Kevin S. Van Horn): > In article <14242@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes: >>In languages with recursive data types, direct dynamic memory (like >>ALLOCATABLE in Fortran 90), and type coersion I've never seen the >>need for pointers _AT_ALL_!! So, rejecting something because it >>interferes with pointers is a null issue. > > Would you care to expand on this? I'm not sure what "direct dynamic memory" > is, for starters. Ok. I'll tackle each feature separately. I) Recursive data types. These are things like linked lists, graphs, trees, etc.. Internally, they are almost certainly implemented using pointers although, there are other implementations possible if you don't mind a constraint against cyclic structures - C.A.R. Hoare claims that cyclic data objects are unstructured monsters that shouldn't be allowed. (Functional languages usually don't allow cyclic data structures.) As an example, consider a possible declaration of a binary tree data type (all examples are in a Fortran/C-ish syntax - that is, the data types are on the left and a possible initialization is on the right of each declared item): type b_tree is recursive node value b_tree left b_tree right end type b_tree and the data type 'node' might be a discriminated union so that the tree could hold a variety of data. Note that the declaration is very like what C does, except that C requires the left and right objects to be pointers. So what is the advantage of this over pointers? Well, for one thing, after you get used to it, it's easier to use and to read. There are no 'dereferencing' operators and there is no distinction between '->' and '.' for field selectors. btree x = null btree y = null Recursive data structures can be empty, these declarations explicitly initialize 'x' and 'y' to be empty (which is probably the default, but it's better to be sure). btree a = {1,null,null} This declares 'a' to be a tree with one node already defined. It also assumes that data type 'node' is assignment conformable with the integer '1' (which the compiler should check). x = btree{2,a,null} y = btree{3,x,btree{4,null,null}} a.value = 7 These executable statements have built the tree: y:3 / \ x:2 :4 / a:7 Note that new nodes are allocated by a the type name applied to a conformable list. Nodes are deallocated when there are no references to them (for example: y.right = null will cause the node with the value '4' to be deallocated). Deallocation is done through reference counts or garbage collection. If overhead is a problem, an explicit deallocation could be added to the language. The implicit deallocation has the advantage of completely eliminating two common types of pointer errors: dangling pointers (which still point to memory that some other part of the code has deallocated) and orphanned data (which is not accessable through any pointer but was never returned to the memory manager). II) Direct dynamic memory. The concept behind this is simple. If you want an object to be dynamically allocated, you say so in the declaration and then you have to code an allocation statement which must be executed before the object can be used. The Fortran 90 syntax is used in the following example: real, allocatable :: x(:,:) ... allocate (x(100,1000)) ... code using 'x' deallocate (x) ... Between the execution of the ALLOCATE and DEALLOCATE statements, the object behaves (from the user's point of view) _exactly_ the same as a statically allocated object does. In particular, dynamically allocated objects are _NOT_ aliased with any other objects! This means that code can be fully optimized, etc.. Note that arrays can be dynamically allocated to any size. Although I didn't show it here, the dimensions of 'x' could have been supplied by any integer expression. For some reason, the fortran committee restricted the ALLOCATABLE attribute to arrays only. Obviously, there is no reason that scalar objects can't be dynamically allocated as well. The function ALLOCATED(x) returns whether the array is currently allocated or not. The object is automatically deallocated when control returns from the its scope unless the object also has the SAVE attribute. There are several advantages to this. The obvious one is that no aliasing is involved (or even possible - without pointers). Since ALLOCATE is a statement and not a function call, it is generic in its arguments: no error prone (and oft ommitted) type casting on the returned pointer (like C has), and no need to manually scale the memory request by sizeof() multiples. The ALLOCATE statement is more legible as are the uses of the allocated object (no dereferencing operator to mess with, no confusion between the object and its pointer, etc.). III) Run-time type coersion. There are two different activities that are both called 'coersion'. One is type conversion (like x = (double) i; in C). You can debate whether the language should be 'strict' and not do any such conversions automatically vs. a 'non-strict' language which allows 'mixed-mode' operations. This is _NOT_ the kind of coersion I am refering to. When doing system programming it is often necessary to 'ignore' the data type of an object in order to directly manipulate its internal structure. This requires the ability to alter the type-tag that the compiler sees for the object - _WITHOUT_ altering the data itself. This is the type of coersion I'm refering to in this discussion. 'Structured Programming' enthusiasts will claim this is ugly and you shouldn't do it. Unfortunately, it is often the only efficient way to get something done (or, would you rather your customers chose someone else's system?). I won't go into the 'ethical' question here. I am only going to talk about _how_ to do the deed - once you've decided that it's a useful feature. In C, the usual way is to use a pointer cast: double x; struct dbl_struct { /* structure of an IEEE double */ int sign_bit : 1; int expon : 11; int fraction : 54; } *p; ... p = (dbl_struct *) &x; ... Unfortunately, this won't work since C is allowed to 'pad' bit-fields in structs. So, C users usually just cast to a char pointer and shift&mask the fields they need. Either way, this is nothing more than (or less than) a run-time EQUIVALENCE statement. But, what is needed has nothing to do with aliasing or pointers! What is really needed is a way do the type-coersion directly. How about: double x; map x as struct { int sign_bit : 1; int expon : 11; int fraction : 54; } ... x = 1.0 /* x still works as a regular 'double' if no 'map fields are present */ x.sign_bit = 1; /* force x negative */ x.expon = x.expon-3; /* divide x by 8 */ ... This, together with a rule that 'map' structures are never padded, will accomplish what's needed without pointers. Your code will not take a performance hit from possible aliasing, etc.. Notice that all these mechanisms are much more precise than the pointer implementations are. Only the recursive data structures involve possible aliasing - but you could argue that you _want_ to allow aliasing here. Since they are more precise, they are easier to read and to write as well as being easier to compile. I have yet to find any algorithms for which the above kinds of features aren't sufficient (both functionally and for efficiency). So, I don't see the need for pointers at all! J. Giles