Path: utzoo!mnetor!uunet!steinmetz!ge-dab!codas!pdn!alan From: alan@pdn.UUCP (Alan Lovejoy) Newsgroups: comp.lang.modula2 Subject: Re: union types Message-ID: <2679@pdn.UUCP> Date: 30 Mar 88 17:52:59 GMT References: <8803221947.AA00248@nrl-iws6.ARPA> <4118@cup.portal.com> <850@vixie.UUCP> Reply-To: alan@pdn.UUCP (0000-Alan Lovejoy) Organization: Paradyne Corporation, Largo, Florida Lines: 223 In article <850@vixie.UUCP> paul@vixie.UUCP (Paul Vixie Esq) writes: >Sometimes you *want* that intervening obj_id. In C, it's harder (though >possible) to make a variant record where this intervening member needn't >be named in references to the variant fields; in M2, you can do it thus: > >TYPE ObjName = RECORD (* note 1 *) > ObjType: INTEGER; > ObjId: RECORD > CASE BOOLEAN OF (* note 2 *) > TRUE: id: INTEGER| > FALSE: path: POINTER TO CHAR; (* note 3 *) > END > END > END; > >Note 1: we are creating a type in the C example, not a variable. Who said otherwise? The Modula-2 examples I have seen in this discussion were all type definitions, weren't they? >Note 2: No ':' before the type as far as I know; [brackets] may be needed > (I don't recall), and the type could be enumerated if more than > two variants are needed -- BOOLEAN is convenient but not mandatory. You are both wrong and right: the original syntax for Modula-2 did not have a colon before the type of a tagless variant. Most compilers still support this syntax (usually as the only option). However, Wirth changed the syntax in the third edition of his book (PIM2e3) making the colon required. >Note 3: POINTER TO CHAR is one way to represent strings, but sometimes arrays > are used. Sure would be great if open arrays were allowed in places > other than a formal argument on a procedure... POINTER TO CHAR is a TERRIBLE way to represent strings (unless you hide this representation behind an opaque type). Why? 1) There is no guarantee that SIZE(aCharVariable) = SIZE(string[0]) (assuming the declarations: VAR aCharVariable: CHAR; string: ARRAY [0..n] OF CHAR). This is not just theoretical. My 68k M2 compiler uses two bytes for a character variable but one byte for each character in a string. This breaks the following code: VAR cp, end: POINTER TO CHAR; string: ARRAY [0..n] OF CHAR; ... cp := ADR(string); end := base + String.Length(string); WHILE ADDRESS(cp) < ADDRESS(base) DO Process(cp^); cp := ADDRESS(cp) + TSIZE(CHAR); END; Even if we replace TSIZE(CHAR) with Char.lengthInAString, we still run up against the problem that the compiler thinks cp^ is a reference to two bytes, not one. So it emits object code such as MOVE.W, ADD.W, CMP.W, etc, when it should be emitting MOVE.B, ADD.B, CMP.B, etc. Whether this results in erroneous behaviour depends on the byte sex of the CPU (and the byte sex assumed in the algorithm). On the 68k, this is even more serious BECAUSE WORD MEMORY ACCESSES MUST OCCUR ONLY FOR EVEN ADDRESSES. An odd effective address used with WORD or LONGWORD data results in a processor-generated ADDRESS ERROR. POINTER TO CHAR is not a portable way to represent strings. 2) When the programmer sees 'string: POINTER TO CHAR', there is vital information about this object which is completely missing: a) How big is the string? b) Has 'string' been properly initialized to point either to NIL or to some string? c) Does 'string' point to an object on the heap (memory from the string was allocated using NEW or ALLOCATE), or does it point to an object on the stack (string := ADR(aStackVariable)). You wouldn't want to call DISPOSE or DEALLOCATE on 'string' if it points to a stack variable. d) How many other pointer variables reference the same object? You don't want to DEALLOCATE 'string' if there are still active references to it. POINTER TO CHAR is not a safe way to represent strings. 3) Programmers normally expect to be able to reference the i'th character in a string using array-index syntax: string[i]. If string is POINTER TO CHAR, that's not possible. Better is 'VAR string: POINTER TO ARRAY [0..Char.maxArray] OF CHAR;'. 'Char' is a definition module containing useful system dependent parameters describing the properties of characters and arrays of characters. Char.maxArray is the highest zero-based index that the compiler will allow for an ARRAY OF CHAR. This permits access to the i'th element using traditional syntax: string^[i], yet still provides for pointer arithmetic and dynamic sizing. It also finesses the SIZE(CHAR) problem. Even better is: TYPE DynamicStringIndex = [0..Char.maxArray]; DynamicString = RECORD size: DynamicStringIndex; base: POINTER TO ARRAY DynamicStringIndex OF CHAR; END; Best is: DEFINITION MODULE DynamicString; EXPORT QUALIFIED STRING, Index, ...; (* PRIVATE is NOT exported *) TYPE Index = [0..Char.maxArray]; PRIVATE; STRING = RECORD size: Index; (* read only variable *) base: PRIVATE; END; 4) "Open arrays" that are not procedure parameters are possible but do not come cheaply. Assume the following declarations: VAR string10: ARRAY [0..9] OF CHAR; string80: ARRAY [0..79] OF CHAR; foo: Bar; dynamicString: ARRAY OF CHAR; i: CARDINAL; When the block in which these declaraction reside is entered, the statically size objects (everything but 'dynamicSring' can easily be allocated on the stack. But the size of 'dynamicString' is undefined, so it cannot be allocated. What can be allocated is a hidded variable which will point to 'dynamicString', and a hidded variable which will specifiy the size of 'dynamicString'. Somewhere in the block, a value may be assigned to dynamicString: dynamicString := string10; It would be nice if we could allocate the memory for dynamicString on the stack at this point. If the usage of dynamicString is as simple as this case is so far, we can. The problem is how to allocate memory on the stack for multiple open arrays whose size changes more than once during execution of the block (open array procedure parameters don't have this problem because their size is known at block entry and cannot change until block exit). When the size of an open array changes, the value returned by ADR(anOpenArray) probably will have to change as well. Alogirithms that are valid for static arrays will likely break if the static arrays are redefined to be dynamic open arrays. There is no general solution to this problem except to allocate memory on the heap and not the stack. So the only thing generic open arrays give us is the ability to write 'anOpenArray[index]' instead of writing 'aDynamicArrayAllocatedByTheProgrammer^[index]'. We could get the same effect by slightly changing the syntax of the language so that 'a[i]' is recognized as shorthand for 'a^[i]'. Oh yeah, the compiler automatically allocates and deallocates for us. Which completely hides from the programmer the fact that these arrays are heap objects. Which has both its good and bad points. It's simpler (for the compiler writer) not to open this can of worms. If you feel you really need this functionality, I suggest you try Smalltalk, LISP or APL. Personally, I'd like to see new syntax permitting variables to have their initialization and termination processing defined as part of their declaration. Example: VAR i: CARDINAL := 0; (* initialize i to zero *) a: POINTER TO ARRAY [0..n] OF CHAR := NEW('Hello, world.') (* initialize a to NEW('Hello, world.'); NEW should be a function which accepts the initial value of the allocated object as its optional argument *) := DISPOSE(a); (* on termination of the block, assign DISPOSE(a) to a; DISPOSE should also be a function *) x: REAL := 3.14159 (* initialize x to pi *) := circumference / (2.0 * radius); (* on block exit, set x to be the value of this expression *) circumference: REAL := 0.0; radius: REAL := 1.0; The block termination code would execute just before the expression following a RETURN statement is evaluated, or else just before executing a RETURN (if the block is not a function). Notice that this can help to guarentee that functions don't return dangling pointers. Another suggestion would be to change the dynamic of pointer syntax so that a reference to a pointer variable references its dynamic object instead of the address of its dynamic object: VAR p: POINTER TO FooBar; a: ADDRESS; .... p := aFooBar; (* old syntax: p^ := aFooBar *) a^ := ADR(p); (* old syntax: a := p *) a^ := p^; (* old syntax: a := p *) This makes it possible to abstract over an algorithm so that it is valid either for pointers or non-pointers. It's analogous to VAR and VALUE parameters for procedures which make it possible to abstract procedure calls with respect to arguments being passed as addresses or as values. --Alan@pdn