Path: utzoo!utgpu!news-server.csri.toronto.edu!bonnie.concordia.ca!thunder.mcrcim.mcgill.edu!snorkelwacker.mit.edu!usc!wuarchive!udel!nigel.ee.udel.edu!mccalpin From: mccalpin@perelandra.cms.udel.edu (John D. McCalpin) Newsgroups: comp.unix.cray Subject: Re: Summary for Protection in Cray Message-ID: Date: 10 Jan 91 23:55:03 GMT References: <1991Jan10.230715.404@agate.berkeley.edu> Sender: usenet@ee.udel.edu Organization: College of Marine Studies, U. Del. Lines: 156 Nntp-Posting-Host: perelandra.cms.udel.edu In-reply-to: chiueh@sprite.Berkeley.EDU's message of 10 Jan 91 23:07:15 GMT > On 10 Jan 91 23:07:15 GMT,chiueh@sprite.Berkeley.EDU (Tzi-cker Chiueh) said: chiueh> chiueh@sprite.Berkeley.EDU (Tzi-cker Chiueh) writes: > So why does Cray get rid of virtual memory altogether ? Or does anybody > know how much performance improvement can we gain from getting rid of VM chiueh> I suggest you see the IEEE proceedings from the Supercomputing chiueh> conference that was held last month in New York. Cray chiueh> published an article in these proceedings that describes their chiueh> memory architecture and gives clock timings for current and chiueh> future memory architectures. chiueh> Summary of what follows: - Memory speed is THE supercomputing chiueh> bottleneck. - Cray can fetch from memory in 17 cycles. chiueh> Demand paging would lengthen this time significantly. - chiueh> Virtual memory trades speed for money. Supercomputers do not chiueh> compromize on speed. - Cray Y-MP/8s have 4 gigabyte per chiueh> second memory bandwidths. - Supercomputing working sets and chiueh> problems sizes tend to be equal. - Demand paging would chiueh> complicate an already very complicated instruction scheduler. chiueh> Memory speed is THE bottleneck in supercomputing. It is was chiueh> makes Cray king of the hill. The Japanese have faster peak chiueh> CPU speeds, but their memory bandwidths are inferior. This is chiueh> a key reason why Cray machines are the fastest computers chiueh> available for most production benchmarks (with notable chiueh> exceptions.) chiueh> The number of cycles needed to transfer the first word from chiueh> memory to a register is one of the most critical timings in chiueh> the supercomputer. Cray can do this in 17 cycles. An SX3 chiueh> requires 70 cycles. An ETA 10 needed hundreds of cycles. chiueh> Adding demand paging will significantly lengthen this cycle chiueh> time. If you can add demand paging without adding cycles to chiueh> this memory fetch time, then I am sure Cray will make you a chiueh> rich person. chiueh> Supercomputers with virtual memories have been tried. The CDC 205 and the chiueh> ETA10 are examples. When these machines ran codes where the problem size chiueh> exceed the RAM size (paging), they ran 10 time slower than when paging did chiueh> not occur. chiueh> Virtual memory is a technique of trading time for money. Virtual memory chiueh> costs less than real memory, but is slower. Slower memory is not an chiueh> option for supercomputing. Witness the success of Cray and the demise of chiueh> ETA. chiueh> The Cray achieves two words read and one word written per clock per CPU. chiueh> On a Y-MP/8 this is a memory bandwidth of 4 gigabytes per second. Disks chiueh> bandwidths are not adequate to keep up with this type of demand. chiueh> The theory of virtual memory depends on the working set being smaller than chiueh> the problem size. In most supercomputer applications working set is the chiueh> problem size. I am sure the architecture of these applications was chiueh> influenced by programming for real-memory machines, so this is somewhat of chiueh> a circular argument. However, for the status quo, this is true. chiueh> Cray's are vector machines with extremely sophisticated instruction chiueh> schedulers. The Cray often has server instructions issued at once in the chiueh> same CPU. X-MPs and Y-MPs scoreboard conflicts between instructions and chiueh> are able to compensate for bank and section memory delays. These delays chiueh> tend to be for one to four cycles. The instruction scheduler architecture chiueh> would be even more difficult if it had to account for page-fault delays of chiueh> many thousands of cycles. An approach to this problem would be to require chiueh> the compilers to never allow a vector sub-section to cross a page chiueh> boundary. chiueh> -- Kent chiueh> -------------------------------------------------------------------------------- chiueh> Saw your information request about Crays, and thought that I might be chiueh> able to point you towards some useful information: chiueh> I suggest that you check up on Control Data's Cyber 180-series chiueh> (currently Cyber 2000-series) machines - they are a full hardware chiueh> Multics implementation, and have some truly "unique" virtual memory chiueh> hardware. I can personally vouch that the address translation chiueh> hardware, which also is doing access control checking, is VERY fast, chiueh> and it has several extra levels of indirectness more than most chiueh> other folks' virtual memory architectures. Cyber 180 is such a chiueh> complete Multics that there is actually NO REAL MEMORY ADDRESSING chiueh> MODE. It is NOT POSSIBLE to access memory by real memory address, the chiueh> hardware doesn't have the capability! chiueh> It is also interesting that when a Cyber 180 is emulating Cyber 170 chiueh> mode, it ALSO has base/limit register hardware in operation, since the chiueh> 170 architecture is real-memory, and only has base/limit restrictions. chiueh> When a Cyber 180 is running in 170 mode, it really is running a chiueh> virtual real-memory machine on its virtual memory hardware (just chiueh> saying this makes my mind feel like a pretzel). chiueh> If nothing else, the CDC stuff should make interesting counter-culture chiueh> reading material for you. It was/is truly different. chiueh> I also suspect that in the Crays (although I have never read the chiueh> hardware prints of a Cray, only the CDC machines), the bounds checking chiueh> is being done on the VIRTUAL address, as it were, not the real memory chiueh> address. This method allowed the old CDC machines (the ones Seymour chiueh> Cray designed) to do their access checking in the CPU, not the memory chiueh> controller, and thus kill of the references earlier in the chiueh> instruction. chiueh> chiueh> -- Gregory chiueh> ---------------------------------------------------------------------------- > Furthermore, this check is done for EVERY reference. >If this is indeed the case, this protection check process should be as >expensive as address mapping in machines that have VM. chiueh> Why do you assume this? Given that the latency of Cray memory is 4 chiueh> cycles or so, the check can be done after the address is sent off to chiueh> memory and can generate a fault before the data gets back. >So why does Cray get rid of virtual memory altogether ? chiueh> Well, many supercomputer applications can't page and have to swap. In chiueh> that case, why provide VM? chiueh> -- greg chiueh> In article <1990Dec19.181343.10365@agate.berkeley.edu> you write: > The kind of protection I have in mind is access right control (e.g., read-only) > "Normal virtual memory systems" perform this kind of protection check while > doing logical-physical address mapping. The protection bits are either in page > tables or TLB. Now, since Cray doesn't have virtual memory, the question is > does it provide access control, if so, where does it put this check ? chiueh> The Cray does not provide extensive access control. For each running program chiueh> a (consecutive) part of actual memory is mapped to the logical address space chiueh> of the program (which starts at 0). With each reference the logical address chiueh> is compared to the logical bounds register, and the base register is added chiueh> to it before going to memory. > From the previous responses, it seemed that Cray only provides out-of-bound > protection check. Furthermore, this check is done for EVERY reference. > If this is indeed the case, this protection check process should be as > expensive as address mapping in machines that have VM. chiueh> Clearly this is much less expensive than true VM; only two registers are needed chiueh> to do everything (address translation and bound checking), and those two chiueh> registers reside directly in the CPU. > So why does Cray get rid of virtual memory altogether ? Or does anybody > know how much performance improvement can we gain from getting rid of VM ? chiueh> This is much less expensive because check and translation go on in parallel chiueh> within a single clock cycle. chiueh> -- dik -- John D. McCalpin mccalpin@perelandra.cms.udel.edu Assistant Professor mccalpin@brahms.udel.edu College of Marine Studies, U. Del. J.MCCALPIN/OMNET