Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!usc!brutus.cs.uiuc.edu!uakari!xanth!ames!ames.arc.nasa.gov!lamaster From: lamaster@ames.arc.nasa.gov (Hugh LaMaster) Newsgroups: comp.arch Subject: Re: 64-bit addresses Message-ID: <43688@ames.arc.nasa.gov> Date: 26 Feb 90 03:10:43 GMT References: <1662@aber-cs.UUCP> Sender: usenet@ames.arc.nasa.gov Organization: NASA - Ames Research Center Lines: 97 In article <1662@aber-cs.UUCP> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes: >In article <43367@ames.arc.nasa.gov> lamaster@ames.arc.nasa.gov (Hugh LaMaster) writes: > In article <6998@celit.fps.com> ps@fps.com (Patricia Shanahan) writes: > >In article <29718@brunix.UUCP> phg@cs.brown.edu (Peter H. Golde) writes: > >tables. If you must use page tables, design them as sparse arrays. > 1) Several posters have mentioned that there is some unspecified but obvious > (to them) major problem with using inverted page tables together with >to fill the TLB you have to scan the reverse map until you find the wanted >virtual addr. With hashing the problem - large amounts of physical memory, also provides the solution. In practice, this overhead appears to be small. >B) You have difficulty supporting shared pages. A physical page cannot >contain an arbitrarily long list of virtual addresses mapped to it. Well, I suppose not, but is this a problem? In particular, most uses of shared memory will limit the number of virtual addresses mapped to it to the number of processes :-) since it would normally be unusual for a process to map an address more than once. Realistically, the question is: how does an IPT scheme compare to the alternatives for the top half of the kernal and shared libraries? Other uses will generally not involve every process in the system. *If* you can demonstrate that you can support: A) A large number of processes sharing a limited amount of memory (kernel and shared libraries), and B) A small number of processes sharing a large memory (the other usual applications) then the scheme is adequate for the present. How much overhead is incurred when accessing shared libraries, for example? And how does the overhead compare with *not* using shared libraries? > There >are two workarounds, to forbid shared memory entirely (the best solution, as >I have already argued, also from a logical point of view), The best solution only if you can demonstrate that *not* providing shared memory provides better performance on the set of problems which the capability is intended to address. It isn't obvious that the overheads are necessarily lower to read through a file sequentially unnecessarily, than to spend a certain percentage of time reloading a TLB with somewhat increased overhead. Sequential I/O is *very general* but some algorithms may require other capabilities for performance reasons... > and sharing >*segments* using indirect segment capabilities (which slows down things a >bit on every reference). Page table support of shared memory sections (I hate to use the word *segment* since so many people think *Intel*...) makes perfect sense. No argument there. >Page size ought to be *only* determined by what makes the working set >smallest, and this points to very small page sizes, indeed it points to >object memory, like the Burroughs. I was intending to address an entirely different question. Why did the the Cyber 205 have two page sizes? Because it had a small TLB combined with a fairly large TLB load time. At vector processing speeds, the TLB reloads on small pages consumed too much time. So, the overhead was reduced with large pages. The pages need to be big enough that the overhead of doing a TLB reload is amortized over enough memory accesses. One way to do this is to use dual page sizes. Another way is to make TLB reload faster- then you can keep one size of pages. Now, what size should those pages be? The goals are that: A) You can support large numerical simulations and databases, and B) Large numbers of small Unix processes accessing small files, and C) Object oriented environments where every object may have its own attributes, etc. Unfortunately, a trade off is required... It is interesting that you mentioned Burroughs. I wonder how viable such a scheme is when memory sizes get to the Gigabyte range? It seems to me that management of memory fragmentation would start to incur a tremendous overhead. The best method known so far is to use fixed sized pages, and require all objects to be a multiple of them. Dual sized pages are also very little work. It is probably worth exploring how far you could go with multiple page sizes, all powers of two. Arbitrary object lengths, on the other hand, look like an unsolved problem to me. BTW, why is "making the working set smallest" a primary goal? Working set is a performance question, and the overall best performance is the goal. >A large page size implies coarse, static grouping of objects, and >approximates the correct policy, and poorly, only in the case of sequential >access, It also approximates correct policy on array dominated simulations where you can, on some systems, require touching 24 bytes or more of data for every CPU cycle. Hugh LaMaster, m/s 233-9, UUCP ames!lamaster NASA Ames Research Center ARPA lamaster@ames.arc.nasa.gov Moffett Field, CA 94035 Phone: (415)694-6117