Path: utzoo!utgpu!watserv1!watmath!att!att!linac!pacific.mps.ohio-state.edu!zaphod.mps.ohio-state.edu!usc!cs.utexas.edu!uunet!mcsun!ukc!edcastle!dcl-cs!aber-cs!athene!pcg
From: pcg@cs.aber.ac.uk (Piercarlo Grandi)
Newsgroups: comp.lang.misc
Subject: Re: C's sins of commission
Message-ID: <PCG.90Nov3185931@athene.cs.aber.ac.uk>
Date: 3 Nov 90 18:59:31 GMT
References: <2062@aber-cs.UUCP> <1990Oct26.155937.29185@maths.nott.ac.uk>
	<PCG.90Oct30201101@teachk.cs.aber.ac.uk>
	<1990Nov2.172508.6393@maths.nott.ac.uk>
Sender: pcg@aber-cs.UUCP
Organization: Coleg Prifysgol Cymru
Lines: 144
In-reply-to: anw@maths.nott.ac.uk's message of 2 Nov 90 17:25:08 GMT

On 2 Nov 90 17:25:08 GMT, anw@maths.nott.ac.uk (Dr A. N. Walker) said:

anw> In article <PCG.90Oct30201101@teachk.cs.aber.ac.uk> pcg@cs.aber.ac.uk
anw> (Piercarlo Grandi) writes:

pcgp> You shouldn't write "node := GREEN; ptr_to_node := node" because
pcgp> "node" then doesn't include its location.  You should have instead
pcgp> "node := { GREEN, node }" and then an associative search for "{
pcgp> GREEN, node }" will find node without the need for pointers.

anw> (pcgp means PierCarlo Grandi Paraphrase).

Not exactly. You should write "node := (GREEN,node id);", where "node
id" is anything you use to know a node from another. For example, its
list of incoming arcs. I would like to repeat my way to describe graphs
in a relation database:

	type node : node_info; # each node contains some info #
	type link : link_info; # each link contains some info #

	relation in(link*,node*); # * means this fields is part ... #
	relation out(node*,link*); # ... of the primary key #

Note that many links or nodes may have the same info, but a specific
link or node will be uniquely identified by context; if it cannot, e.g.
a two node graph like this:

		"B"
	   .-------------.
	  "A"		"A"
	   \_____________/
		"B"

then the graph is inherently ambiguous. It can be disambiguated by
noting that the "A" and "B" information content of the nodes is
incomplete, and should be really:

			("B",top)
	    .-------------------------------.
	("A",left)			("A",right)
	    \_______________________________/
			("B",bottom)

This representation also makes it obvious that each graph has a dual,
i.e. that it is an essentially symmetric structure, that the first
version of the example above is inherently ambiguous (and indeed a
relational database will not allow you to represent it), etc..., all
facts that are lost using pointers.

anw> In the following direct extract, "This" means the version that I
anw> *shouldn't* write.]

pcg> This adds a whole new world of complication, because we need no longer
pcg> just an algebra for objects, but also an algebra for locations, and we
pcg> must prove that a program not only works ok wrt to contents, but also to
pcg> locations, and that the encoding of parts of the contents in the
pcg> location is preserved correctly.

anw> But if content has to include location, then an algebra for objects
anw> has to include an algebra for location, so the algebra and the
anw> proofs already had to be supplied.

You got it the other way round: if pointers make a difference, this can
only mean that part of an object's content are implicit in its in-memory
location, not viceversa (what actually happens is that location includes
a bit of content). If the information encoded in the pointer value were
explicit in the object's contents, pointer values would become
irrelevant, and only object contents would matter; there would not be
any need for comparison operations on pointer values, only the usual
ones on contents.


Back to basics: objects in memory are representations for more abstract
entities. The representation is based on encoding, usually encoding onto
the integers (Hoare's essay in "Structured Programming").

The idea is that if two objects have the same content ("value" when seen
as an integer) they represent the same entity. You might say "no,
because they are at different locations, and I can change one but not
the other which makes them different".

This is pure nominalism; it simply implies that you have encoded part of
the representation in the storage occupied by the object and part in its
address. You never *need* to do so; you can always encode it explicitly,
and often it is better to do so, because it frees you from worry about
locations.

An example where pointers make a difference:

You have two lists, one of names of american states and another of
nuclear submarines. If you get a pointer to the string "Ohio" you don't
know if you are referring to a submarine or a a state, unless you
implement the two lists in two different memory areas of which you know
the bounds, and look at the pointer value to see in which area it falls.

You have encoded the type of the name, a boolean, in the address. You
could have encoded it in the strings, e.g. by writing "ohio" for
submarines and "OHIO" for states. If you had done this you would only
need to understand the encoding rules for strings to understand your
program; encoding the type of the name in its address may save you some
space, but means that you also have to deal with pointer values, not
just data values, because pointer values have become meaningful (laden
with information).

Now, in general the "ohio"/"OHIO" representation is safer, for example
because the fact that the Ohio in question is the submarine and not the
state is location independent, that means that I can copy the string
around and not lose any bits of information. This is what I mean by
saying that the algebra of locations is not dragged into the act,
because no information is encoded in locations.

anw> Furthermore, if I am denied access to the algebra of locations,
anw> then some useful ideas like anonymous objects, and like remembering
anw> locations, become inexpressible.

They are most definitely not useful at the application level, only at a
purely implementation oriented level. They add complications to
reasoning about representations, even if they save some resources. In
particular, in a value based systems all objects are anonymous; they are
only identified by contents, not by context.

You have a claim that pointers are necessary, if I understand correctly.
My observations are:

1) To represent information, pointers are never necessary and when used
as such are actually hazardous. From the application point of view all
information had better be represented explicitly with encoding in the
contents of an object, as in a relational database.

2) Pointers *can* be used to represent information, and the relative
hazard (extra complicated encoding rules) can be cost effective; the
more so the nearer we are to the machine, because it can conserve
storage (information bits are implied by the address) and speed (you
need not to fetch information bits from memory, you can just deduce them
from the address).

But, at the application level usually these concerns are irrelevant.
What you want to do is to represent entities, not be concerned about how
the relevant encoding is done. In building systems programs your
concerns are reversed, so pointers are useuful.
--
Piercarlo "Peter" Grandi           | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk