Path: utzoo!utgpu!watserv1!watmath!att!att!linac!pacific.mps.ohio-state.edu!zaphod.mps.ohio-state.edu!usc!cs.utexas.edu!uunet!mcsun!ukc!edcastle!dcl-cs!aber-cs!athene!pcg From: pcg@cs.aber.ac.uk (Piercarlo Grandi) Newsgroups: comp.lang.misc Subject: Re: C's sins of commission Message-ID: Date: 3 Nov 90 18:59:31 GMT References: <2062@aber-cs.UUCP> <1990Oct26.155937.29185@maths.nott.ac.uk> <1990Nov2.172508.6393@maths.nott.ac.uk> Sender: pcg@aber-cs.UUCP Organization: Coleg Prifysgol Cymru Lines: 144 In-reply-to: anw@maths.nott.ac.uk's message of 2 Nov 90 17:25:08 GMT On 2 Nov 90 17:25:08 GMT, anw@maths.nott.ac.uk (Dr A. N. Walker) said: anw> In article pcg@cs.aber.ac.uk anw> (Piercarlo Grandi) writes: pcgp> You shouldn't write "node := GREEN; ptr_to_node := node" because pcgp> "node" then doesn't include its location. You should have instead pcgp> "node := { GREEN, node }" and then an associative search for "{ pcgp> GREEN, node }" will find node without the need for pointers. anw> (pcgp means PierCarlo Grandi Paraphrase). Not exactly. You should write "node := (GREEN,node id);", where "node id" is anything you use to know a node from another. For example, its list of incoming arcs. I would like to repeat my way to describe graphs in a relation database: type node : node_info; # each node contains some info # type link : link_info; # each link contains some info # relation in(link*,node*); # * means this fields is part ... # relation out(node*,link*); # ... of the primary key # Note that many links or nodes may have the same info, but a specific link or node will be uniquely identified by context; if it cannot, e.g. a two node graph like this: "B" .-------------. "A" "A" \_____________/ "B" then the graph is inherently ambiguous. It can be disambiguated by noting that the "A" and "B" information content of the nodes is incomplete, and should be really: ("B",top) .-------------------------------. ("A",left) ("A",right) \_______________________________/ ("B",bottom) This representation also makes it obvious that each graph has a dual, i.e. that it is an essentially symmetric structure, that the first version of the example above is inherently ambiguous (and indeed a relational database will not allow you to represent it), etc..., all facts that are lost using pointers. anw> In the following direct extract, "This" means the version that I anw> *shouldn't* write.] pcg> This adds a whole new world of complication, because we need no longer pcg> just an algebra for objects, but also an algebra for locations, and we pcg> must prove that a program not only works ok wrt to contents, but also to pcg> locations, and that the encoding of parts of the contents in the pcg> location is preserved correctly. anw> But if content has to include location, then an algebra for objects anw> has to include an algebra for location, so the algebra and the anw> proofs already had to be supplied. You got it the other way round: if pointers make a difference, this can only mean that part of an object's content are implicit in its in-memory location, not viceversa (what actually happens is that location includes a bit of content). If the information encoded in the pointer value were explicit in the object's contents, pointer values would become irrelevant, and only object contents would matter; there would not be any need for comparison operations on pointer values, only the usual ones on contents. Back to basics: objects in memory are representations for more abstract entities. The representation is based on encoding, usually encoding onto the integers (Hoare's essay in "Structured Programming"). The idea is that if two objects have the same content ("value" when seen as an integer) they represent the same entity. You might say "no, because they are at different locations, and I can change one but not the other which makes them different". This is pure nominalism; it simply implies that you have encoded part of the representation in the storage occupied by the object and part in its address. You never *need* to do so; you can always encode it explicitly, and often it is better to do so, because it frees you from worry about locations. An example where pointers make a difference: You have two lists, one of names of american states and another of nuclear submarines. If you get a pointer to the string "Ohio" you don't know if you are referring to a submarine or a a state, unless you implement the two lists in two different memory areas of which you know the bounds, and look at the pointer value to see in which area it falls. You have encoded the type of the name, a boolean, in the address. You could have encoded it in the strings, e.g. by writing "ohio" for submarines and "OHIO" for states. If you had done this you would only need to understand the encoding rules for strings to understand your program; encoding the type of the name in its address may save you some space, but means that you also have to deal with pointer values, not just data values, because pointer values have become meaningful (laden with information). Now, in general the "ohio"/"OHIO" representation is safer, for example because the fact that the Ohio in question is the submarine and not the state is location independent, that means that I can copy the string around and not lose any bits of information. This is what I mean by saying that the algebra of locations is not dragged into the act, because no information is encoded in locations. anw> Furthermore, if I am denied access to the algebra of locations, anw> then some useful ideas like anonymous objects, and like remembering anw> locations, become inexpressible. They are most definitely not useful at the application level, only at a purely implementation oriented level. They add complications to reasoning about representations, even if they save some resources. In particular, in a value based systems all objects are anonymous; they are only identified by contents, not by context. You have a claim that pointers are necessary, if I understand correctly. My observations are: 1) To represent information, pointers are never necessary and when used as such are actually hazardous. From the application point of view all information had better be represented explicitly with encoding in the contents of an object, as in a relational database. 2) Pointers *can* be used to represent information, and the relative hazard (extra complicated encoding rules) can be cost effective; the more so the nearer we are to the machine, because it can conserve storage (information bits are implied by the address) and speed (you need not to fetch information bits from memory, you can just deduce them from the address). But, at the application level usually these concerns are irrelevant. What you want to do is to represent entities, not be concerned about how the relevant encoding is done. In building systems programs your concerns are reversed, so pointers are useuful. -- Piercarlo "Peter" Grandi | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk