Path: utzoo!attcan!uunet!mcvax!dik From: dik@cwi.nl (Dik T. Winter) Newsgroups: comp.arch Subject: Re: Cray & Amdahl (Really: VM on vector processors) (Was: ...) Message-ID: <7588@boring.cwi.nl> Date: 21 Jul 88 23:46:22 GMT References: <4232@cbmvax.UUCP> <76700035@p.cs.uiuc.edu> <9a0K/cbluk1010IHSPc@amdahl.uts.amdahl.com> <228@sdeggo.UUCP> <5342@june.cs.washington.edu> Organization: CWI, Amsterdam Lines: 67 In article <5342@june.cs.washington.edu> pardo@uw-june.UUCP (David Keppel) writes: > More to the point, I also believe that the Crays don't have virtual > memory (because it slows down the computer!) while the Amdahls do > (have virtual memory). > > Relevant (really?) question: Does it make more sense to buy a little > bit of very fast memory and slow it down with virtual memory, or to > buy a whole bunch of fast physical memory and slow it down by putting > it farther away? (Assume: $ is no problem). Obviously the answer > depends on the access patterns (and dataset size) of the programs > being run. I wonder if anybody has insight on this? > The major problem with virtual memory on vector machines is that you get paging interrupts during the execution of an instruction. The CDC 205 has virtual memory, and there are problems. Let me explain a bit how it works on the 205. The machine (of course) maintains a page table, mapping virtual to real memory. Of course you do not want to interrupt a vector instruction if memory access crosses a page boundary, so the machine has 16 associative registers that hold the mapping entries for the 16 pages most recently accessed. Whenever a vector instruction crosses a page boundary to a page whose mapping information is in the associative registers, the next page of real memory is easily found, and the instruction continues without interrupt (all translation etc. is done during buffering and unbuffering the 205 performs in its pipes). However, if the cross is to a page whose information is not in the associative register, the mapping entry has to be found in memory. This involves interrupt of the instruction, draining the pipes, saving state, reading mapping info and restarting the instruction. That takes a lot of time. The 205 has two different page sizes, large pages of 65536 words (8 bytes/word) and small pages of (site selectable) 512, 2048 or 8192 words. The number of associative registers is 16, and these are shared amongst the jobs on a system. It appears that the selection of small page size is very critical. I have a small (~10 lines) program that will run 2 times as fast on a 1 pipe 205 with small pages of 2048 words than on the same machine with small pages of 512 words. This is all due to the page boundary crossing. (Oh, we have also a single instruction that takes 90 seconds to complete; too long for the timer to handle.) So what this amounts to is having virtual memory will need address translation. This in turn requires page tables and part of these (or all?) need to be in very fast (associative) registers. Associative, because there is no time to do a search if you do not want to drain the pipes. The Cray on the other hand addresses all its memory directly, so no address translation is needed and no vector instruction interrupt. Strange enough, the Cray has a maximal vector length of 64, and all instructions except load/store are through registers. The 205 on the other hand has only vector instructions that go from memory to memory, and the maximal vector size is 65535. So my arguments above would imply that you could have a Cray with virtual memory, but not a 205! Another point about VM on vector processors: it makes you think the machine is large enough to handle your problem, while it will mostly be trashing pages. A theoretical example for a 205 with 1 Mwords of memory: try to multiply two 1024*1024 matrices to get a third. The program will be accepted, CP time will be something like 60 seconds. Only paging will take about 1 year (disk access only, and they are fast). -- dik t. winter, cwi, amsterdam, nederland INTERNET : dik@cwi.nl BITNET/EARN: dik@mcvax