Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10 5/3/83; site cornell.UUCP Path: utzoo!linus!decvax!microsoft!uw-beaver!cornell!Pavel.pa@PARC-MAXC.ARPA From: Pavel.pa@PARC-MAXC.ARPA@cornell.UUCP (Pavel.pa@PARC-MAXC.ARPA) Newsgroups: net.lang Subject: Treating Data Abstractly Message-ID: <5215@cornell.UUCP> Date: Thu, 1-Sep-83 14:12:38 EDT Article-I.D.: cornell.5215 Posted: Thu Sep 1 14:12:38 1983 Date-Received: Fri, 2-Sep-83 03:07:48 EDT Sender: daemon@cornell.UUCP Organization: Cornell Computer Science Lines: 104 From: Pavel.pa@PARC-MAXC.ARPA To: net-lang@CORNELL Hal Perkins mentions that he knows of no clean solutions to the problem of treating variables abtractly while maintaining independence from the underlying implementation. Hal, that seems funny, coming form you. I think that I know of at least four languages that seem to meet that criterion and I'm sure you know at least two of them. Please tell me what I'm missing, why you're dissatisfied with these: 1) Smalltalk-80. Since I started implementing my work here at PARC in Smalltalk, I have really come to realise just how well it does this job. For example, it is interesting to watch the debate about I/O while sitting here and using streams in Smalltalk. I can write code which assumes a parameter is a stream and make it work without knowing whether that stream is a file, a byte-stream to another host on the network, or some sequenceable collection, like a string, or array, or linked-list or whatever. Furthermore, I \never/ need to make a decision about what is being used; the same code could be used in the same application on different kinds of streams at different times. As long as there is some standard set of messages which stream-like objects understand (such as nextPut: for a single object, nextPutAll: for a collection, and perhaps cr and space as abbreviations for certain nextPut: operations), I can use the protocol in blissful ignorance of what I'm really talking to. As an example, there are two messages (actually many more than two, but I'm only interested in these two) which are understood by all objects: printOn: aStream meaning, 'please print a representation of yourself on the given stream', and printString meaning, 'please return a string which is a representation of yourself'. Almost all classes of objects have a specialised version of the printOn: message (the default is to simply print the name of the object's class preceded by 'a' or 'an', as in 'an Array'; not a very useful representation), but there is only one implementation of printString. It appears at the top of the class hierarchy, in class Object: printString "Answer a String whose characters are a description of the receiver." | aStream | aStream _ String newWriteStream: 16. self printOn: aStream. ^aStream contents This routine simply makes a stream on a new String object and sends the printOn: message to print the representation on that stream. It then returns whatever was printed on the stream. In this way, objects can make exactly one routine to print a representation of themselves, whether that representation is to go on a file, a string or across the network. Smalltalk gains a lot of advantage from this style of information-hiding. The only piece of code which needs to know what kind of object is really being dealt with is the one that creates it, a very reasonable point of view. 2. CLU/Mesa/Cedar These three languages, far more traditional in their philosophies than Smalltalk, all take a similar approach to providing the ability to treat data in an abstract manner, unconcerned with the implementation (It should be pointed out that the main features of their approach also appear in other languages, such as Alphard and, more recently and more well-known, Modula-2). In these languages, one creates a 'cluster' (in CLU terminology) or 'module' (in Mesa, Cedar and Modula-2) which is a collection of data and functions that have complete access to one another but protection from the outside world. Part of the specification of a module is the explicit 'exportation' of certain of the functions for use by the outside world. These constitute the so-called 'interface' on the module. Frequently these modules represent the implementation of a data-type, with the exported functions comprising the set of operations that make sense on that type. Since programs can only use the operations on the type that are exported, one can set up a reasonable interface specification, write much code which uses it and \then/ decide upon an implementation. CLU, in fact, uses a library of implementations, one of which is selected for use only at program linking time. I believe Mesa and Cedar have similar mechanisms. 3. Conclusions The obvious similarity between the mechanisms used in these languages is that the operations are part of the data objects (or their descriptions). This is in contrast to the approaches taken in Pascal and C (and many other languages) in which a description of a data-type only talks about the components and structure of that type. To my mind, the components and structure (i.e. the implementation of the type) are exactly what should \not/ be visible to clients of the type. (This is certainly not a new point of view, just less well-known than it deserves.) So tell me, Hal, what are these languages missing in the way of data abstraction that you'd like to see? For their general approach (i.e. a procedural specification of the program), they seem to do the job as well as anything that springs to mind. Pavel Curtis Xerox PARC, Software Concepts Group {decvax | vax135 | allegra | ...}!cornell!pavel (UUCP) Pavel@Cornell (ARPA, CSNET)