Path: utzoo!utgpu!news-server.csri.toronto.edu!bonnie.concordia.ca!thunder.mcrcim.mcgill.edu!snorkelwacker.mit.edu!usc!samsung!spool2.mu.edu!uwm.edu!bionet!agate!sprite.Berkeley.EDU!chiueh
From: chiueh@sprite.Berkeley.EDU (Tzi-cker Chiueh)
Newsgroups: comp.unix.cray
Subject: Summary for Protection in Cray
Message-ID: <1991Jan10.230715.404@agate.berkeley.edu>
Date: 10 Jan 91 23:07:15 GMT
Sender: usenet@agate.berkeley.edu (USENET Administrator)
Reply-To: chiueh@sprite.Berkeley.EDU (Tzi-cker Chiueh)
Organization: U.C. Berkeley Sprite Project
Lines: 246

Here is a summary of responses I got regarding the protection facilities in Cray.

Questions: How does Cray provide protection

I am currently investigating methods of minimizing virtual memory overheads
in high-performance computers. I think I have some way of solving 
logical-physical address translation. But I can't think of an efficient way 
of providing protection. I figure since Cray doesn't have virtual memory, maybe
it can teach me something about achieving protection inexpensively. 
Can anybody enlighten me about how Cray provides protection as normal virtual 
memory systems offer, or where can I find useful description in this regard ? 
Thank you. 

-- tzi-cker

--------------------------------------------------------------------------
CRAYs (at least X and Y models) use base and limit register pairs (one
set for code, one for data) for each process.  The base register is added
to the logical address (as generated by the program) to give the physical
memory address.  It is, I think, the physical address that is checked
against the limit register, and, if out of bounds, a program or operand
range exception is generated.

With separate code/data base/limit registers code sharing is possible, but
I don't think it is much used.

-- david

-------------------------------------------------------------------------------
The kind of protection I have in mind is access right control (e.g., read-only)
"Normal virtual memory systems" perform this kind of protection check while 
doing logical-physical address mapping. The protection bits are either in page 
tables or TLB.  Now, since Cray doesn't have virtual memory, the question is 
does it provide access control, if so, where does it put this check ?
From the previous responses, it seemed that Cray only provides out-of-bound
protection check. Furthermore, this check is done for EVERY reference. 
If this is indeed the case, this protection check process should be as 
expensive as address mapping in machines that have VM. 
So why does Cray get rid of virtual memory altogether ?  Or does anybody 
know how much performance improvement can we gain from getting rid of VM ?

-- tzi-cker

------------------------------------------------------------------------------
I'm not exactly sure of your question but I'll try to give you some
insight.  Cray uses a pair of base and limit address registers.
There is a base and limit address for instruction memory and a base
and limit address for data memory.  When a program makes a reference
to logical location zero, the base address is added to it to get
the physical address.  If the physical address is larger than the
limit address, then a "program range" error interrupt is generated.
The limit address is your protection mechanism.  Thats about as
simple a memory management h/w as you can get.

You might also want to take a look at the memory management scheme 
in an old Digital pdp8 or Data General Eclipse.  The DEC used 
memory segments and the DG used memory banks as I can remember.
 
  --Larry


------------------------------------------------------------------------------
The Cray protection scheme is so simple that it is confusing.  A bounds
check is done on every single address generated.  But, it is easy, and takes
place in the few cycles necessary, because it is  just a simple integer add and
compare.  Period.  The reason that it can do this, is because memory is 
*contiguous* !!!

Doesn't that cause a lot of problems with memory fragmentation?  Yes.  The
Cray kernel does a lot of copying to compact memory.  And, it can only do it
at certain times.  This does cause poor memory utilization compared with a
system with VM.  Every design is a compromise.  The Cray 1/X/Y are upward
compatible, and the Cray-1 didn't have VM because it was viewed as an
unnecessary complication at the time it was designed.  (Early 1970's).


-- Hugh 

-----------------------------------------------------------------------------
1. On a machine of this speed, a comparison with a bounds value is _much_
   faster than a second memory reference (i.e., to the page table).

2. The page table lookup has to be done before the real memory reference.  The
   bounds check can be done in parallel with the real memory reference.

3. On a vector operation, it is possible to do the bounds checking for just
   the first and final addresses.  Potentially, the page table lookup would
   have to be done for each address.

4. Cray crams as much circuitry as possible on their boards.  Adding circuitry
   to handle the paging probably would have meant giving up something else.

-- Kurt 

-----------------------------------------------------------------------------
chiueh@sprite.Berkeley.EDU (Tzi-cker Chiueh) writes:
> So why does Cray get rid of virtual memory altogether ?  Or does anybody 
> know how much performance improvement can we gain from getting rid of VM 
?


I suggest you see the IEEE proceedings from the Supercomputing conference 
that was held last month in New York.  Cray published an article in these 
proceedings that describes their memory architecture and gives clock 
timings for current and future memory architectures.

Summary of what follows:
- Memory speed is THE supercomputing bottleneck.
- Cray can fetch from memory in 17 cycles.  Demand paging would
lengthen this time significantly.
- Virtual memory trades speed for money.  Supercomputers do not compromize 
on speed.
- Cray Y-MP/8s have 4 gigabyte per second memory bandwidths.
- Supercomputing working sets and problems sizes tend to be equal.
- Demand paging would complicate an already very complicated instruction 
scheduler.

Memory speed is THE bottleneck in supercomputing.  It is was makes Cray 
king of the hill.  The Japanese have faster peak CPU speeds, but their 
memory bandwidths are inferior.  This is a key reason why Cray machines 
are the fastest computers available for most production benchmarks (with 
notable exceptions.)

The number of cycles needed to transfer the first word from memory to a 
register is one of the most critical timings in the supercomputer.  Cray 
can do this in 17 cycles.  An SX3 requires 70 cycles.  An ETA 10 needed 
hundreds of cycles.  Adding demand paging will significantly lengthen this 
cycle time.  If you can add demand paging without adding cycles to this 
memory fetch time, then I am sure Cray will make you a rich person.

Supercomputers with virtual memories have been tried.  The CDC 205 and the 
ETA10 are examples.  When these machines ran codes where the problem size 
exceed the RAM size (paging), they ran 10 time slower than when paging did 
not occur.  

Virtual memory is a technique of trading time for money.  Virtual memory 
costs less than real memory, but is slower.  Slower memory is not an 
option for supercomputing.   Witness the success of Cray and the demise of 
ETA.

The Cray achieves two words read and one word written per clock per CPU.  
On a Y-MP/8 this is a memory bandwidth of 4 gigabytes per second.  Disks 
bandwidths are not adequate to keep up with this type of demand.

The theory of virtual memory depends on the working set being smaller than 
the problem size.  In most supercomputer applications working set is the 
problem size.  I am sure the architecture of these applications was 
influenced by programming for real-memory machines, so this is somewhat of 
a circular argument.  However, for the status quo, this is true.

Cray's are vector machines with extremely sophisticated instruction 
schedulers.  The Cray often has server instructions issued at once in the 
same CPU.  X-MPs and Y-MPs scoreboard conflicts between  instructions and 
are able to compensate for bank and section memory delays.  These delays 
tend to be for one to four cycles.  The instruction scheduler architecture 
would be even more difficult if it had to account for page-fault delays of 
many thousands of cycles.  An approach to this problem would be to require 
the compilers to never allow a vector sub-section to cross a page 
boundary.  

-- Kent 


--------------------------------------------------------------------------------
Saw your information request about Crays, and thought that I might be
able to point you towards some useful information:

I suggest that you check up on Control Data's Cyber 180-series
(currently Cyber 2000-series) machines - they are a full hardware
Multics implementation, and have some truly "unique" virtual memory
hardware. I can personally vouch that the address translation
hardware, which also is doing access control checking, is VERY fast,
and it has several extra levels of indirectness more than most
other folks' virtual memory architectures. Cyber 180 is such a
complete Multics that there is actually NO REAL MEMORY ADDRESSING
MODE. It is NOT POSSIBLE to access memory by real memory address, the
hardware doesn't have the capability!

It is also interesting that when a Cyber 180 is emulating Cyber 170
mode, it ALSO has base/limit register hardware in operation, since the
170 architecture is real-memory, and only has base/limit restrictions.
When a Cyber 180 is running in 170 mode, it really is running a
virtual real-memory machine on its virtual memory hardware (just
saying this makes my mind feel like a pretzel).

If nothing else, the CDC stuff should make interesting counter-culture
reading material for you. It was/is truly different.

I also suspect that in the Crays (although I have never read the
hardware prints of a Cray, only the CDC machines), the bounds checking
is being done on the VIRTUAL address, as it were, not the real memory
address. This method allowed the old CDC machines (the ones Seymour
Cray designed) to do their access checking in the CPU, not the memory
controller, and thus kill of the references earlier in the
instruction.
 
-- Gregory 


----------------------------------------------------------------------------
> Furthermore, this check is done for EVERY reference. 
>If this is indeed the case, this protection check process should be as 
>expensive as address mapping in machines that have VM. 

Why do you assume this? Given that the latency of Cray memory is 4
cycles or so, the check can be done after the address is sent off to
memory and can generate a fault before the data gets back.

>So why does Cray get rid of virtual memory altogether ?

Well, many supercomputer applications can't page and have to swap. In
that case, why provide VM?

-- greg


In article <1990Dec19.181343.10365@agate.berkeley.edu> you write:
 > The kind of protection I have in mind is access right control (e.g., read-only)
 > "Normal virtual memory systems" perform this kind of protection check while 
 > doing logical-physical address mapping. The protection bits are either in page 
 > tables or TLB.  Now, since Cray doesn't have virtual memory, the question is 
 > does it provide access control, if so, where does it put this check ?
The Cray does not provide extensive access control.  For each running program
a (consecutive) part of actual memory is mapped to the logical address space
of the program (which starts at 0).  With each reference the logical address
is compared to the logical bounds register, and the base register is added
to it before going to memory.
 > From the previous responses, it seemed that Cray only provides out-of-bound
 > protection check. Furthermore, this check is done for EVERY reference. 
 > If this is indeed the case, this protection check process should be as 
 > expensive as address mapping in machines that have VM. 
Clearly this is much less expensive than true VM; only two registers are needed
to do everything (address translation and bound checking), and those two
registers reside directly in the CPU.
 > So why does Cray get rid of virtual memory altogether ?  Or does anybody 
 > know how much performance improvement can we gain from getting rid of VM ?
This is much less expensive because check and translation go on in parallel
within a single clock cycle.

-- dik