Path: utzoo!attcan!uunet!mcvax!steven
From: steven@cwi.nl (Steven Pemberton)
Newsgroups: comp.lang.misc
Subject: Re: What makes a language "easy" to program in?
Message-ID: <366@piring.cwi.nl>
Date: 10 Jun 88 15:17:27 GMT
References: <350@piring.cwi.nl> <711@cunixc.columbia.edu> <FELDMARK.88May31160241@hanako.stars.flab.Fujitsu.JUNET> <3799@pasteur.Berkeley.Edu> <712@cunixc.columbia.edu> <3880@pasteur.Berkeley.Edu>
Distribution: comp
Organization: CWI, Amsterdam
Lines: 112

In article <350@piring.cwi.nl>, guido@cwi.nl (Guido van Rossum) wrote:
> Remember that trees are often used as a way to implement other
> abstractions: associative arrays, sorted sets with easy insert/
> retrieve/delete operations, etc.  It might be more appropriate to
> add such "higher-level" data types to a language than pointers.

In article <3880@pasteur.Berkeley.Edu> faustus@ic.Berkeley.EDU (Wayne
A. Christopher) replied:
> The way you would add such data types to a language is to build them out
> of the primitives available, which are pointers (in the case of C), or
> arrays (in the case of APL).  Actually adding them as primitives would
> be a very bad idea -- you could never add enough to make everybody happy.

Actually, from the point of view of what's in the Subject: line - what
makes a program easy to program in - our experience here is that
there's a *big* win in making the high-level data-types the
'primitive' types of your language. You actually hardly ever want the
low-level types themselves, but only as a tool for creating the higher
ones. On the occasions that you really need the low-level types, then
you can use the high-level ones to simulate the lower-level ones.

You don't need to supply *all* the data-types, only the best set to
implement the others that you need.

In designing the type system for ABC, Lambert Meertens did something
like the following (I imagine Lambert reads this group, so if I get
it wrong, I'm sure he'll put me right):

- enumerate all the data-types you can think of: stacks, trees, lists,
  bags, sets, graphs, ...

- to each data-type assign an importance weight, based on experience,
  intuition, etc. For instance, sets are more important than deques.

- for each type, design a set of implementation schemes, based on
  other data-types. For instance, trees implemented as a special case
  of graphs. Associate with these usage efforts, based on how often
  you use the type, and how difficult it is to define in terms of the
  other types. 

- Write a program that takes as primitive types all subsets of the set
  of types, and computes the implementation complexity of the whole
  set of types, where the predefined types have complexity 1, and the
  other types have complexities calculated from the implementation
  schemes. 

- Scatter plot the results, with usage effort * importance on the one
  axis, and the total complexity on the other.

- Then consider all points close to the axes as candidates for the set
  of primitive types for your language. Select according to taste.

For ABC we came up with 5 high-level, primitive types: strings,
numbers, compounds(fixed-sized tuples), bags, tables.

> The sorts of things I use pointers for that I couldn't use arrays for
> are parse trees and graphs (both directed and undirected).  Often there
> is no "higher-level concept" I am trying to express.  Has anybody written
> code to deal with these things in APL?  Is it as ugly as it seems?

But 'parse-trees' and 'graphs' *are* the higher level concept.

Let me give an example, and use graphs as that example.
ABC has tables as a primitive type. These are generalised arrays: the
keys may be of *any* one type (including other tables), and so may the
elements. ABC also has bags (multisets).

To implement a graph, you use a table, mapping node names to bags of
node names, which represent the outward going arrows. E.g.
	graph[1] = {2; 3}
	graph[2] = {1; 3}
	graph[3] = {3}

As another example, define sets in terms of bags. here is set union
and difference:

HOW TO RETURN set1 union set2:
    FOR elem IN set1:
        IF elem not.in set2:
            INSERT elem IN set2
    RETURN set2

HOW TO RETURN set1 diff set2:
    FOR elem IN set1:
        IF elem IN set2:
            REMOVE elem FROM set2
    RETURN set2

Using these two definitions, I can now define garbage collection
(finding the minimum connected subgraph from a given node) in a dozen
lines:

HOW TO RETURN graph reachable.from root:
    PUT {root}, {} IN to.do, new
    WHILE to.do > {}:
        PUT min to.do IN node
        REMOVE node FROM to.do
        TREAT NODE
    RETURN new
TREAT NODE:
    IF node IN keys graph:
        PUT graph[node] IN new[node]
        PUT to.do union new.nodes IN to.do
new.nodes: RETURN new[node] diff (keys new)

More info on ABC? Send me mail.

One day I'll force Lambert to write an article on the type choosing
algorithm. I'll threaten to delete half his files from our overflowing
disk, or something like that :-)

Steven Pemberton, CWI, Amsterdam; steven@cwi.nl or try mcvax!steven.uucp