Path: utzoo!attcan!uunet!mcsun!ukc!dcl-cs!aber-cs!pcg
From: pcg@aber-cs.UUCP (Piercarlo Grandi)
Newsgroups: comp.lang.misc
Subject: Re: C's sins of commission
Summary: Identity should be by contents, not location.
Message-ID: <2062@aber-cs.UUCP>
Date: 21 Oct 90 17:59:41 GMT
Reply-To: pcg@cs.aber.ac.uk (Piercarlo Grandi)
Organization: Dept of CS, UCW Aberystwyth
	(Disclaimer: my statements are purely personal)
Lines: 104

In article <1990Oct15.175924.14455@maths.nott.ac.uk> anw@maths.nott.ac.uk
(Dr A. N. Walker) writes:

  In article <1990Oct10.195202.2340@d.cs.okstate.edu> norman@d.cs.okstate.edu
  (Norman Graham) writes:

  >[...] you may want to read this
  >Miranda definition of a 3-ary tree (just to be different).
  
  I keep trying to get interested in Miranda, but "they" want real
  money for it, which is bad news for a Maths Dept.

Try Haskell or SML. Not terribly different. Much the same technology.

  >  three_tree ::= Leaf num | Node int three_tree three_tree three_tree
  
  Well, that's fine for an acyclic structure [though not the way I usually
  *think* of trees] and a lazy evaluator, but it doesn't make sense to me in
  the form
  
  	three_graph ::= Node int three_graph three_graph three_graph
  
  Where does it all end, even lazily?

Ahem. It does not end. There is no problem though. It is just an incorrect
definition. Let's rewrite it like this:

	graph = NULL: empty | roots
	roots = node | roots node

	node  = NODE: info LINKS: nodes
	nodes = empty | nodes node

Observe that this is very much like a syntax. Indeed you can use a 2 level
grammar to model anything at all (book by Tzischritis and Uzgalis), and it
does generate infinite syntaxes (needed to model context dependency), not
just infinite syntax trees (needed to model graphs). This is not a
difficult, because you never need to actually expand the grammar in an
infinite way, given a finite data strcture or program.

  How do I express the identity of one node with another [ie, perhaps the
  same node, but reached via some chain of edges]?  How do I *draw* it?

This is a thing on which I have already commented, but I love repeating
myself. The correct way to implement identity is by contents, not location.
If you encode part of the contents in the location, that's an implementation
choice, not a necessary thing. It is also not very nice.

  What is *anyone*'s objection to
  
  	MODE NODE = STRUCT (INT info, REF NODE left, right, middle) ?

That instead of manipulating just values of type NODE you also have
to deal with values of type REF NODE. This complicates life considerably.
Consider the relational version in some pseudo notation:

	domain source,target: SOMETYPE

	left(source*,target)
	right(source*,target)
	middle(source*,target)

Now you say: but this means that I cannot have two nodes with the same info!
Precisely. We are *assuming* that info identifies a node. Put into info
whatever detail serves to identify the node.

Then you say: but this is inefficient! the same info is replicated up to six
times! the links between a node and its neichbours are spread in three
relations! This is matter for the implementation. It can keep a table of
infos, and keep the three relations clustered either by first or second
column, depending on which is thought ot be the preferred join column.

Note that in your pointer based *implementation* the join column is assumed
to be the first, because your NODE struct is clustered physically on
outgoing pointer, not incoming pointer; you would have to write

	MODE NODE = STRUCT(INT info, FLEX [] REF NODE incoming)

to implement the graph clustered by incoming pointer.

Indeed you can model any graph in relational database technology by saying
something like

	domain node : SOMETYPE
	domain link : OTHERTYPE

	source(node,link*)
	target(link*,node)

This also clearly models the notorious symmetry between nodes and links, and
that links are not just pointers in the general case.

It all depends on what you really want. Data design is more difficult and
less clear with pointers, but implementations are more efficient, all because
pointers require less levels of abstractions.

Note that nobody I know (except for me I mean :->) addresses the issue of
data design in programming languages, yet many programs deal with internal
databases with considerably more complicated implied schemas than those
of many DMBS applications.
-- 
Piercarlo "Peter" Grandi           | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk