Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!zaphod.mps.ohio-state.edu!rpi!batcomputer!munnari.oz.au!yoyo.aarnet.edu.au!sirius.ucs.adelaide.edu.au!fang!se2!sxc
From: sxc@se2.dsto.oz (Stephen Crawley)
Newsgroups: comp.arch
Subject: Re: Swizzling (very RISC) instead of 64 bits (was Re: 64 bit addresses)
Message-ID: <1411@fang.dsto.oz>
Date: 21 Feb 91 02:48:32 GMT
References: <cbiAekK00VpINR8nEq@andrew.cmu.edu> 	<1991Feb13.170045.16864@uicbert.eecs.uic.edu> 	<3341@sequent.cs.qmw.ac.uk> <PCG.91Feb15155454@odin.cs.aber.ac.uk>
Sender: news@fang.dsto.oz
Lines: 70
Followups-to: comp.lang.misc

pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:

[... about how going from 32 to 64 bit pointers hits Smalltalk etc]

>Actually these large expansion factors (not quite 2, but as you remark
>fairly large) need not apply everywhere. Smalltalk regrettably is (like
>virtually every other OO or 'AI' language) reference based. If it
>*allowed* object composition by contiguity instead as well, as most less
>advanced languages do, then many less pointers would be needed. Some
>statistics show that sharing of subobjects is fairly rare is some major
>applications, so they could be profitably inlined.

Adding explicit inlining of data structures to a typical reference
based language (RBL) would destroy many of the properties that make it
so advanced!

If you add constructs that allow the programmer to declare static
(non-reference) variables and fields:

  1) You need an "address-of" operator:

     o if you don't have one, statics are not first-class

     o the address-of operator may be implicit e.g. use
       call-by-reference when passing statics as function args

     o if you have address-of (implicit or explicit) you have
       to get local frames containing statics from the heap.
       If you don't do this, the language will have dangling 
       pointer problems ... baaaad!

  2) The programmer is encouraged to spend too much effort on
     storage efficiency when this is largely irrelevant in most 
     cases.  [What costs more; memory or programmer time?]

  3) The computation model is more complex:

     o address-of gives an extra source of aliasing; another 
       source of errors => less reliable programs

     o programs are harder to reason about.

  4) The language syntax is larger and messier.


The alternative is to have the compiler/linker infer which variables
and fields in a RBL program can be inlined without changing the
program's semantics.  Unfortunately this optimisation would need to be
done globally to get much benefit.  Hence we can't use it in an
incremental or persistent environment.  [That covers most "advanced"
languages ... sigh]  At least this isn't unpleasant for the programmer.

In addition, inlining has hidden performance penalties;

  1) You have to allocate local frames on the heap.  This costs
     CPU on procedure entry / exit, extra GC overheads, extra
     memory.  Clever optimisation can avoid some of this, at the
     cost of compilation time and compiler complexity.  [Of course 
     many RBL's start out with local frames on the heap!]

  2) The garbage collector must understand pointers into the middle
     of objects.  This costs CPU time and maybe memory too.  

  3) You can end up with uncollectable garbage.  E.g. if you pass
     a reference to a static field of an object, the entire object 
     must be retained by the GC.  [OK, so the programmer / optimiser
     shouldn't have inlined that field.  But that's just making
     his / her / its job harder!]

-- Steve