Path: utzoo!attcan!uunet!mcsun!ukc!dcl-cs!aber-cs!pcg From: pcg@aber-cs.UUCP (Piercarlo Grandi) Newsgroups: comp.lang.misc Subject: Re: C's sins of commission Summary: Identity should be by contents, not location. Message-ID: <2062@aber-cs.UUCP> Date: 21 Oct 90 17:59:41 GMT Reply-To: pcg@cs.aber.ac.uk (Piercarlo Grandi) Organization: Dept of CS, UCW Aberystwyth (Disclaimer: my statements are purely personal) Lines: 104 In article <1990Oct15.175924.14455@maths.nott.ac.uk> anw@maths.nott.ac.uk (Dr A. N. Walker) writes: In article <1990Oct10.195202.2340@d.cs.okstate.edu> norman@d.cs.okstate.edu (Norman Graham) writes: >[...] you may want to read this >Miranda definition of a 3-ary tree (just to be different). I keep trying to get interested in Miranda, but "they" want real money for it, which is bad news for a Maths Dept. Try Haskell or SML. Not terribly different. Much the same technology. > three_tree ::= Leaf num | Node int three_tree three_tree three_tree Well, that's fine for an acyclic structure [though not the way I usually *think* of trees] and a lazy evaluator, but it doesn't make sense to me in the form three_graph ::= Node int three_graph three_graph three_graph Where does it all end, even lazily? Ahem. It does not end. There is no problem though. It is just an incorrect definition. Let's rewrite it like this: graph = NULL: empty | roots roots = node | roots node node = NODE: info LINKS: nodes nodes = empty | nodes node Observe that this is very much like a syntax. Indeed you can use a 2 level grammar to model anything at all (book by Tzischritis and Uzgalis), and it does generate infinite syntaxes (needed to model context dependency), not just infinite syntax trees (needed to model graphs). This is not a difficult, because you never need to actually expand the grammar in an infinite way, given a finite data strcture or program. How do I express the identity of one node with another [ie, perhaps the same node, but reached via some chain of edges]? How do I *draw* it? This is a thing on which I have already commented, but I love repeating myself. The correct way to implement identity is by contents, not location. If you encode part of the contents in the location, that's an implementation choice, not a necessary thing. It is also not very nice. What is *anyone*'s objection to MODE NODE = STRUCT (INT info, REF NODE left, right, middle) ? That instead of manipulating just values of type NODE you also have to deal with values of type REF NODE. This complicates life considerably. Consider the relational version in some pseudo notation: domain source,target: SOMETYPE left(source*,target) right(source*,target) middle(source*,target) Now you say: but this means that I cannot have two nodes with the same info! Precisely. We are *assuming* that info identifies a node. Put into info whatever detail serves to identify the node. Then you say: but this is inefficient! the same info is replicated up to six times! the links between a node and its neichbours are spread in three relations! This is matter for the implementation. It can keep a table of infos, and keep the three relations clustered either by first or second column, depending on which is thought ot be the preferred join column. Note that in your pointer based *implementation* the join column is assumed to be the first, because your NODE struct is clustered physically on outgoing pointer, not incoming pointer; you would have to write MODE NODE = STRUCT(INT info, FLEX [] REF NODE incoming) to implement the graph clustered by incoming pointer. Indeed you can model any graph in relational database technology by saying something like domain node : SOMETYPE domain link : OTHERTYPE source(node,link*) target(link*,node) This also clearly models the notorious symmetry between nodes and links, and that links are not just pointers in the general case. It all depends on what you really want. Data design is more difficult and less clear with pointers, but implementations are more efficient, all because pointers require less levels of abstractions. Note that nobody I know (except for me I mean :->) addresses the issue of data design in programming languages, yet many programs deal with internal databases with considerably more complicated implied schemas than those of many DMBS applications. -- Piercarlo "Peter" Grandi | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk