Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!usc!brutus.cs.uiuc.edu!uakari!xanth!ames!ames.arc.nasa.gov!lamaster
From: lamaster@ames.arc.nasa.gov (Hugh LaMaster)
Newsgroups: comp.arch
Subject: Re: 64-bit addresses
Message-ID: <43688@ames.arc.nasa.gov>
Date: 26 Feb 90 03:10:43 GMT
References: <1662@aber-cs.UUCP>
Sender: usenet@ames.arc.nasa.gov
Organization: NASA - Ames Research Center
Lines: 97

In article <1662@aber-cs.UUCP> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
>In article <43367@ames.arc.nasa.gov> lamaster@ames.arc.nasa.gov (Hugh LaMaster) writes:
>  In article <6998@celit.fps.com> ps@fps.com (Patricia Shanahan) writes:
>  >In article <29718@brunix.UUCP> phg@cs.brown.edu (Peter H. Golde) writes:
>  >tables. If you must use page tables, design them as sparse arrays.
>  1) Several posters have mentioned that there is some unspecified but obvious
>  (to them) major problem with using inverted page tables together with

>to fill the TLB you have to scan the reverse map until you find the wanted
>virtual addr.

With hashing the problem - large amounts of
physical memory, also provides the solution.  In practice, this overhead appears
to be small.  

>B) You have difficulty supporting shared pages. A physical page cannot
>contain an arbitrarily long list of virtual addresses mapped to it.

Well, I suppose not, but is this a problem?  In particular, most uses of shared
memory will limit the number of virtual addresses mapped to it to the number
of processes :-)  since it would normally be unusual for a process to map an
address more than once.  Realistically, the question is: how does an IPT scheme
compare to the alternatives for the top half of the kernal and shared libraries?
Other uses will generally not involve every process in the system.

*If* you can demonstrate that you can support:
A) A large number of processes sharing a limited amount of memory (kernel and
shared libraries), and
B) A small number of processes sharing a large memory (the other usual 
applications) then the scheme is adequate for the present.

How much overhead is incurred when accessing shared libraries, for example?
And how does the overhead compare with *not* using shared libraries?

>  There
>are two workarounds, to forbid shared memory entirely (the best solution, as
>I have already argued, also from a logical point of view),

The best solution only if you can demonstrate that *not* providing shared
memory provides better performance on the set of problems which the capability
is intended to address.  It isn't obvious that the overheads are necessarily
lower to read through a file sequentially unnecessarily, than to spend a
certain percentage of time reloading a TLB with somewhat increased overhead.
Sequential I/O is *very general* but some algorithms may require other
capabilities for performance reasons...

> and sharing
>*segments* using indirect segment capabilities (which slows down things a
>bit on every reference).

Page table support of shared memory sections (I hate to use the word *segment*
since so many people think *Intel*...) makes perfect sense.  No argument there.

>Page size ought to be *only* determined by what makes the working set
>smallest, and this points to very small page sizes, indeed it points to
>object memory, like the Burroughs.

I was intending to address an entirely different question.  Why did the
the Cyber 205 have two page sizes?  Because it had a small TLB combined with
a fairly large TLB load time.  At vector processing speeds, the TLB reloads
on small pages consumed too much time.  So, the overhead was reduced with
large pages.  The pages need to be big enough that the overhead of doing
a TLB reload is amortized over enough memory accesses.  One way to do this
is to use dual page sizes.  Another way is to make TLB reload faster- then
you can keep one size of pages.  

Now, what size should those pages be? The goals are that:
A) You can support large numerical simulations and databases, and
B) Large numbers of small Unix processes accessing small files, and
C) Object oriented environments where every object may have its own attributes,
etc.  Unfortunately, a trade off is required...

It is interesting that you mentioned Burroughs.  I wonder how viable such
a scheme is when memory sizes get to the Gigabyte range?  It seems to me
that management of memory fragmentation would start to incur a tremendous
overhead.  The best method known so far is to use fixed sized pages, and
require all objects to be a multiple of them.  Dual sized pages are also
very little work.  It is probably worth exploring how far you could go with
multiple page sizes, all powers of two.  Arbitrary object lengths, on the
other hand, look like an unsolved problem to me.

BTW, why is "making the working set smallest" a primary goal?  Working set
is a performance question, and the overall best performance is the goal.

>A large page size implies coarse, static grouping of objects, and
>approximates the correct policy, and poorly, only in the case of sequential
>access, 

It also approximates correct policy on array dominated simulations where you
can, on some systems, require touching 24 bytes or more of data for every 
CPU cycle.


  Hugh LaMaster, m/s 233-9,  UUCP ames!lamaster
  NASA Ames Research Center  ARPA lamaster@ames.arc.nasa.gov
  Moffett Field, CA 94035     
  Phone:  (415)694-6117