Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!usc!zaphod.mps.ohio-state.edu!think!yale!cmcl2!lanl!lambda!jlg
From: jlg@lambda.UUCP (Jim Giles)
Newsgroups: comp.lang.misc
Subject: Re: Anyone want to design a language?
Message-ID: <14245@lambda.UUCP>
Date: 22 Feb 90 22:08:30 GMT
References: <390@argosy.UUCP>
Distribution: usa
Lines: 188

From article <390@argosy.UUCP>, by kevin@argosy.UUCP (Kevin S. Van Horn):
> In article <14242@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
>>In languages with recursive data types, direct dynamic memory (like 
>>ALLOCATABLE in Fortran 90), and type coersion I've never seen the 
>>need for pointers _AT_ALL_!!  So, rejecting something because it
>>interferes with pointers is a null issue.
> 
> Would you care to expand on this?  I'm not sure what "direct dynamic memory"
> is, for starters.

Ok.  I'll tackle each feature separately.

I)   Recursive data types.
        These are things like linked lists, graphs, trees, etc..  Internally,
        they are almost certainly implemented using pointers although, there
        are other implementations possible if you don't mind a constraint
        against cyclic structures - C.A.R. Hoare claims that cyclic data
        objects are unstructured monsters that shouldn't be allowed.
        (Functional languages usually don't allow cyclic data structures.)

        As an example, consider a possible declaration of a binary tree
        data type (all examples are in a Fortran/C-ish syntax - that is,
        the data types are on the left and a possible initialization
        is on the right of each declared item):

              type b_tree is recursive
                 node   value
                 b_tree left
                 b_tree right
              end type b_tree

        and the data type 'node' might be a discriminated union so that
        the tree could hold a variety of data.  Note that the declaration
        is very like what C does, except that C requires the left and right
        objects to be pointers.  So what is the advantage of this over
        pointers?  Well, for one thing, after you get used to it, it's
        easier to use and to read.  There are no 'dereferencing' operators
        and there is no distinction between '->' and '.' for field selectors.

              btree x = null
              btree y = null

        Recursive data structures can be empty, these declarations explicitly
        initialize 'x' and 'y' to be empty (which is probably the default,
        but it's better to be sure).

              btree a = {1,null,null}

        This declares 'a' to be a tree with one node already defined.  It
        also assumes that data type 'node' is assignment conformable with
        the integer '1' (which the compiler should check).

              x = btree{2,a,null}
              y = btree{3,x,btree{4,null,null}}
              a.value = 7

        These executable statements have built the tree:

                  y:3
                 /  \
               x:2   :4
              /  
            a:7

        Note that new nodes are allocated by a the type name applied
        to a conformable list.  Nodes are deallocated when there are
        no references to them (for example: y.right = null will cause
        the node with the value '4' to be deallocated).  Deallocation
        is done through reference counts or garbage collection.  If
        overhead is a problem, an explicit deallocation could be added
        to the language.  The implicit deallocation has the advantage
        of completely eliminating two common types of pointer errors:
        dangling pointers (which still point to memory that some other
        part of the code has deallocated) and orphanned data (which is
        not accessable through any pointer but was never returned to
        the memory manager).

II)  Direct dynamic memory.

        The concept behind this is simple.  If you want an object to
        be dynamically allocated, you say so in the declaration and
        then you have to code an allocation statement which must be
        executed before the object can be used.  The Fortran 90 syntax
        is used in the following example:

              real, allocatable :: x(:,:)
               ...
              allocate (x(100,1000))
               ... code using 'x'
              deallocate (x)
               ...

        Between the execution of the ALLOCATE and DEALLOCATE statements,
        the object behaves (from the user's point of view) _exactly_
        the same as a statically allocated object does.  In particular,
        dynamically allocated objects are _NOT_ aliased with any other
        objects!  This means that code can be fully optimized, etc..

        Note that arrays can be dynamically allocated to any size.
        Although I didn't show it here, the dimensions of 'x' could
        have been supplied by any integer expression.  For some reason,
        the fortran committee restricted the ALLOCATABLE attribute
        to arrays only.  Obviously, there is no reason that scalar
        objects can't be dynamically allocated as well.

        The function ALLOCATED(x) returns whether the array is currently
        allocated or not.  The object is automatically deallocated when
        control returns from the its scope unless the object also has
        the SAVE attribute.

        There are several advantages to this.  The obvious one is that
        no aliasing is involved (or even possible - without pointers).
        Since ALLOCATE is a statement and not a function call, it is
        generic in its arguments: no error prone (and oft ommitted)
        type casting on the returned pointer (like C has), and no
        need to manually scale the memory request by sizeof() multiples.
        The ALLOCATE statement is more legible as are the uses of the
        allocated object (no dereferencing operator to mess with, no
        confusion between the object and its pointer, etc.).

III) Run-time type coersion.

        There are two different activities that are both called 'coersion'.
        One is type conversion (like x = (double) i; in C).  You can
        debate whether the language should be 'strict' and not do any
        such conversions automatically vs. a 'non-strict' language
        which allows 'mixed-mode' operations.  This is _NOT_ the kind
        of coersion I am refering to.

        When doing system programming it is often necessary to 'ignore'
        the data type of an object in order to directly manipulate its
        internal structure.  This requires the ability to alter the
        type-tag that the compiler sees for the object - _WITHOUT_
        altering the data itself.  This is the type of coersion I'm
        refering to in this discussion.  'Structured Programming'
        enthusiasts will claim this is ugly and you shouldn't do it.
        Unfortunately, it is often the only efficient way to get
        something done (or, would you rather your customers chose
        someone else's system?).  I won't go into the 'ethical'
        question here.  I am only going to talk about _how_ to do
        the deed - once you've decided that it's a useful feature.

        In C, the usual way is to use a pointer cast:

              double x;
              struct dbl_struct {  /* structure of an IEEE double */
                 int sign_bit : 1;
                 int expon    : 11;
                 int fraction : 54;
              } *p;
               ...
              p = (dbl_struct *) &x; 
               ...

        Unfortunately, this won't work since C is allowed to 'pad'
        bit-fields in structs.  So, C users usually just cast to
        a char pointer and shift&mask the fields they need.  Either
        way, this is nothing more than (or less than) a run-time
        EQUIVALENCE statement.  But, what is needed has nothing
        to do with aliasing or pointers!  What is really needed is
        a way do the type-coersion directly.  How about:

              double x;
              map x as struct {
                 int sign_bit : 1;
                 int expon    : 11;
                 int fraction : 54;
              }
               ...
              x = 1.0 /* x still works as a regular 'double' if no 'map
                         fields are present                            */
              x.sign_bit = 1; /* force x negative */
              x.expon = x.expon-3; /* divide x by 8 */
               ...

        This, together with a rule that 'map' structures are never padded,
        will accomplish what's needed without pointers.  Your code will
        not take a performance hit from possible aliasing, etc..

Notice that all these mechanisms are much more precise than the pointer
implementations are.  Only the recursive data structures involve possible
aliasing - but you could argue that you _want_ to allow aliasing here.
Since they are more precise, they are easier to read and to write as well
as being easier to compile.  I have yet to find any algorithms for which
the above kinds of features aren't sufficient (both functionally and
for efficiency).  So, I don't see the need for pointers at all!

J. Giles