Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10    5/3/83; site cornell.UUCP
Path: utzoo!linus!decvax!microsoft!uw-beaver!cornell!Pavel.pa@PARC-MAXC.ARPA
From: Pavel.pa@PARC-MAXC.ARPA@cornell.UUCP (Pavel.pa@PARC-MAXC.ARPA)
Newsgroups: net.lang
Subject: Treating Data Abstractly
Message-ID: <5215@cornell.UUCP>
Date: Thu, 1-Sep-83 14:12:38 EDT
Article-I.D.: cornell.5215
Posted: Thu Sep  1 14:12:38 1983
Date-Received: Fri, 2-Sep-83 03:07:48 EDT
Sender: daemon@cornell.UUCP
Organization: Cornell Computer Science
Lines: 104

From: Pavel.pa@PARC-MAXC.ARPA
To: net-lang@CORNELL

Hal Perkins mentions that he knows of no clean solutions to the problem
of treating variables abtractly while maintaining independence from the
underlying implementation.

Hal, that seems funny, coming form you.  I think that I know of at least
four languages that seem to meet that criterion and I'm sure you know at
least two of them.  Please tell me what I'm missing, why you're
dissatisfied with these:

1) Smalltalk-80.
	Since I started implementing my work here at PARC in Smalltalk, I have
really come to realise just how well it does this job.  For example, it
is interesting to watch the debate about I/O while sitting here and
using streams in Smalltalk.  I can write code which assumes a parameter
is a stream and make it work without knowing whether that stream is a
file, a byte-stream to another host on the network, or some sequenceable
collection, like a string, or array, or linked-list or whatever.
Furthermore, I \never/ need to make a decision about what is being used;
the same code could be used in the same application on different kinds
of streams at different times.  As long as there is some standard set of
messages which stream-like objects understand (such as nextPut: for a
single object, nextPutAll: for a collection, and perhaps cr and space as
abbreviations for certain nextPut: operations), I can use the protocol
in blissful ignorance of what I'm really talking to.
	As an example, there are two messages (actually many more than two, but
I'm only interested in these two) which are understood by all objects:
		printOn: aStream
meaning, 'please print a representation of yourself on the given
stream', and
		printString
meaning, 'please return a string which is a representation of yourself'.
Almost all classes of objects have a specialised version of the printOn:
message (the default is to simply print the name of the object's class
preceded by 'a' or 'an', as in 'an Array'; not a very useful
representation), but there is only one implementation of printString.
It appears at the top of the class hierarchy, in class Object:
	printString
	"Answer a String whose characters are a description of the receiver."

	| aStream |
	aStream _ String newWriteStream: 16.
	self printOn: aStream.
	^aStream contents

This routine simply makes a stream on a new String object and sends the
printOn: message to print the representation on that stream.  It then
returns whatever was printed on the stream.  In this way, objects can
make exactly one routine to print a representation of themselves,
whether that representation is to go on a file, a string or across the
network.
	Smalltalk gains a lot of advantage from this style of
information-hiding.  The only piece of code which needs to know what
kind of object is really being dealt with is the one that creates it, a
very reasonable point of view.


2. CLU/Mesa/Cedar
	These three languages, far more traditional in their philosophies than
Smalltalk, all take a similar approach to providing the ability to treat
data in an abstract manner, unconcerned with the implementation (It
should be pointed out that the main features of their approach also
appear in other languages, such as Alphard and, more recently and more
well-known, Modula-2).
	In these languages, one creates a 'cluster' (in CLU terminology) or
'module' (in Mesa, Cedar and Modula-2) which is a collection of data and
functions that have complete access to one another but protection from
the outside world.  Part of the specification of a module is the
explicit 'exportation' of certain of the functions for use by the
outside world.  These constitute the so-called 'interface' on the
module.  Frequently these modules represent the implementation of a
data-type, with the exported functions comprising the set of operations
that make sense on that type.  Since programs can only use the
operations on the type that are exported, one can set up a reasonable
interface specification, write much code which uses it and \then/ decide
upon an implementation.  CLU, in fact, uses a library of
implementations, one of which is selected for use only at program
linking time.  I believe Mesa and Cedar have similar mechanisms.


3. Conclusions
	The obvious similarity between the mechanisms used in these languages
is that the operations are part of the data objects (or their
descriptions).  This is in contrast to the approaches taken in Pascal
and C (and many other languages) in which a description of a data-type
only talks about the components and structure of that type.  To my mind,
the components and structure (i.e. the implementation of the type) are
exactly what should \not/ be visible to clients of the type.  (This is
certainly not a new point of view, just less well-known than it
deserves.)

	So tell me, Hal, what are these languages missing in the way of data
abstraction that you'd like to see?  For their general approach (i.e. a
procedural specification of the program), they seem to do the job as
well as anything that springs to mind.

	Pavel Curtis
	Xerox PARC, Software Concepts Group
	{decvax | vax135 | allegra | ...}!cornell!pavel		(UUCP)
	Pavel@Cornell		(ARPA, CSNET)