Path: utzoo!utgpu!news-server.csri.toronto.edu!bonnie.concordia.ca!thunder.mcrcim.mcgill.edu!snorkelwacker.mit.edu!usc!samsung!spool2.mu.edu!uwm.edu!bionet!agate!sprite.Berkeley.EDU!chiueh From: chiueh@sprite.Berkeley.EDU (Tzi-cker Chiueh) Newsgroups: comp.unix.cray Subject: Summary for Protection in Cray Message-ID: <1991Jan10.230715.404@agate.berkeley.edu> Date: 10 Jan 91 23:07:15 GMT Sender: usenet@agate.berkeley.edu (USENET Administrator) Reply-To: chiueh@sprite.Berkeley.EDU (Tzi-cker Chiueh) Organization: U.C. Berkeley Sprite Project Lines: 246 Here is a summary of responses I got regarding the protection facilities in Cray. Questions: How does Cray provide protection I am currently investigating methods of minimizing virtual memory overheads in high-performance computers. I think I have some way of solving logical-physical address translation. But I can't think of an efficient way of providing protection. I figure since Cray doesn't have virtual memory, maybe it can teach me something about achieving protection inexpensively. Can anybody enlighten me about how Cray provides protection as normal virtual memory systems offer, or where can I find useful description in this regard ? Thank you. -- tzi-cker -------------------------------------------------------------------------- CRAYs (at least X and Y models) use base and limit register pairs (one set for code, one for data) for each process. The base register is added to the logical address (as generated by the program) to give the physical memory address. It is, I think, the physical address that is checked against the limit register, and, if out of bounds, a program or operand range exception is generated. With separate code/data base/limit registers code sharing is possible, but I don't think it is much used. -- david ------------------------------------------------------------------------------- The kind of protection I have in mind is access right control (e.g., read-only) "Normal virtual memory systems" perform this kind of protection check while doing logical-physical address mapping. The protection bits are either in page tables or TLB. Now, since Cray doesn't have virtual memory, the question is does it provide access control, if so, where does it put this check ? From the previous responses, it seemed that Cray only provides out-of-bound protection check. Furthermore, this check is done for EVERY reference. If this is indeed the case, this protection check process should be as expensive as address mapping in machines that have VM. So why does Cray get rid of virtual memory altogether ? Or does anybody know how much performance improvement can we gain from getting rid of VM ? -- tzi-cker ------------------------------------------------------------------------------ I'm not exactly sure of your question but I'll try to give you some insight. Cray uses a pair of base and limit address registers. There is a base and limit address for instruction memory and a base and limit address for data memory. When a program makes a reference to logical location zero, the base address is added to it to get the physical address. If the physical address is larger than the limit address, then a "program range" error interrupt is generated. The limit address is your protection mechanism. Thats about as simple a memory management h/w as you can get. You might also want to take a look at the memory management scheme in an old Digital pdp8 or Data General Eclipse. The DEC used memory segments and the DG used memory banks as I can remember. --Larry ------------------------------------------------------------------------------ The Cray protection scheme is so simple that it is confusing. A bounds check is done on every single address generated. But, it is easy, and takes place in the few cycles necessary, because it is just a simple integer add and compare. Period. The reason that it can do this, is because memory is *contiguous* !!! Doesn't that cause a lot of problems with memory fragmentation? Yes. The Cray kernel does a lot of copying to compact memory. And, it can only do it at certain times. This does cause poor memory utilization compared with a system with VM. Every design is a compromise. The Cray 1/X/Y are upward compatible, and the Cray-1 didn't have VM because it was viewed as an unnecessary complication at the time it was designed. (Early 1970's). -- Hugh ----------------------------------------------------------------------------- 1. On a machine of this speed, a comparison with a bounds value is _much_ faster than a second memory reference (i.e., to the page table). 2. The page table lookup has to be done before the real memory reference. The bounds check can be done in parallel with the real memory reference. 3. On a vector operation, it is possible to do the bounds checking for just the first and final addresses. Potentially, the page table lookup would have to be done for each address. 4. Cray crams as much circuitry as possible on their boards. Adding circuitry to handle the paging probably would have meant giving up something else. -- Kurt ----------------------------------------------------------------------------- chiueh@sprite.Berkeley.EDU (Tzi-cker Chiueh) writes: > So why does Cray get rid of virtual memory altogether ? Or does anybody > know how much performance improvement can we gain from getting rid of VM ? I suggest you see the IEEE proceedings from the Supercomputing conference that was held last month in New York. Cray published an article in these proceedings that describes their memory architecture and gives clock timings for current and future memory architectures. Summary of what follows: - Memory speed is THE supercomputing bottleneck. - Cray can fetch from memory in 17 cycles. Demand paging would lengthen this time significantly. - Virtual memory trades speed for money. Supercomputers do not compromize on speed. - Cray Y-MP/8s have 4 gigabyte per second memory bandwidths. - Supercomputing working sets and problems sizes tend to be equal. - Demand paging would complicate an already very complicated instruction scheduler. Memory speed is THE bottleneck in supercomputing. It is was makes Cray king of the hill. The Japanese have faster peak CPU speeds, but their memory bandwidths are inferior. This is a key reason why Cray machines are the fastest computers available for most production benchmarks (with notable exceptions.) The number of cycles needed to transfer the first word from memory to a register is one of the most critical timings in the supercomputer. Cray can do this in 17 cycles. An SX3 requires 70 cycles. An ETA 10 needed hundreds of cycles. Adding demand paging will significantly lengthen this cycle time. If you can add demand paging without adding cycles to this memory fetch time, then I am sure Cray will make you a rich person. Supercomputers with virtual memories have been tried. The CDC 205 and the ETA10 are examples. When these machines ran codes where the problem size exceed the RAM size (paging), they ran 10 time slower than when paging did not occur. Virtual memory is a technique of trading time for money. Virtual memory costs less than real memory, but is slower. Slower memory is not an option for supercomputing. Witness the success of Cray and the demise of ETA. The Cray achieves two words read and one word written per clock per CPU. On a Y-MP/8 this is a memory bandwidth of 4 gigabytes per second. Disks bandwidths are not adequate to keep up with this type of demand. The theory of virtual memory depends on the working set being smaller than the problem size. In most supercomputer applications working set is the problem size. I am sure the architecture of these applications was influenced by programming for real-memory machines, so this is somewhat of a circular argument. However, for the status quo, this is true. Cray's are vector machines with extremely sophisticated instruction schedulers. The Cray often has server instructions issued at once in the same CPU. X-MPs and Y-MPs scoreboard conflicts between instructions and are able to compensate for bank and section memory delays. These delays tend to be for one to four cycles. The instruction scheduler architecture would be even more difficult if it had to account for page-fault delays of many thousands of cycles. An approach to this problem would be to require the compilers to never allow a vector sub-section to cross a page boundary. -- Kent -------------------------------------------------------------------------------- Saw your information request about Crays, and thought that I might be able to point you towards some useful information: I suggest that you check up on Control Data's Cyber 180-series (currently Cyber 2000-series) machines - they are a full hardware Multics implementation, and have some truly "unique" virtual memory hardware. I can personally vouch that the address translation hardware, which also is doing access control checking, is VERY fast, and it has several extra levels of indirectness more than most other folks' virtual memory architectures. Cyber 180 is such a complete Multics that there is actually NO REAL MEMORY ADDRESSING MODE. It is NOT POSSIBLE to access memory by real memory address, the hardware doesn't have the capability! It is also interesting that when a Cyber 180 is emulating Cyber 170 mode, it ALSO has base/limit register hardware in operation, since the 170 architecture is real-memory, and only has base/limit restrictions. When a Cyber 180 is running in 170 mode, it really is running a virtual real-memory machine on its virtual memory hardware (just saying this makes my mind feel like a pretzel). If nothing else, the CDC stuff should make interesting counter-culture reading material for you. It was/is truly different. I also suspect that in the Crays (although I have never read the hardware prints of a Cray, only the CDC machines), the bounds checking is being done on the VIRTUAL address, as it were, not the real memory address. This method allowed the old CDC machines (the ones Seymour Cray designed) to do their access checking in the CPU, not the memory controller, and thus kill of the references earlier in the instruction. -- Gregory ---------------------------------------------------------------------------- > Furthermore, this check is done for EVERY reference. >If this is indeed the case, this protection check process should be as >expensive as address mapping in machines that have VM. Why do you assume this? Given that the latency of Cray memory is 4 cycles or so, the check can be done after the address is sent off to memory and can generate a fault before the data gets back. >So why does Cray get rid of virtual memory altogether ? Well, many supercomputer applications can't page and have to swap. In that case, why provide VM? -- greg In article <1990Dec19.181343.10365@agate.berkeley.edu> you write: > The kind of protection I have in mind is access right control (e.g., read-only) > "Normal virtual memory systems" perform this kind of protection check while > doing logical-physical address mapping. The protection bits are either in page > tables or TLB. Now, since Cray doesn't have virtual memory, the question is > does it provide access control, if so, where does it put this check ? The Cray does not provide extensive access control. For each running program a (consecutive) part of actual memory is mapped to the logical address space of the program (which starts at 0). With each reference the logical address is compared to the logical bounds register, and the base register is added to it before going to memory. > From the previous responses, it seemed that Cray only provides out-of-bound > protection check. Furthermore, this check is done for EVERY reference. > If this is indeed the case, this protection check process should be as > expensive as address mapping in machines that have VM. Clearly this is much less expensive than true VM; only two registers are needed to do everything (address translation and bound checking), and those two registers reside directly in the CPU. > So why does Cray get rid of virtual memory altogether ? Or does anybody > know how much performance improvement can we gain from getting rid of VM ? This is much less expensive because check and translation go on in parallel within a single clock cycle. -- dik