Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!caip!clyde!burl!ulysses!mhuxr!mhuxt!houxm!ihnp4!inuxc!pur-ee!j.cc.purdue.edu!h.cc.purdue.edu!aeh From: aeh@h.cc.purdue.edu (Dale Talcott) Newsgroups: net.arch Subject: Cyber 205 (Was: Re: VERY LARGE main memories) Message-ID: <2993@h.cc.purdue.edu> Date: Mon, 8-Sep-86 04:32:11 EDT Article-I.D.: h.2993 Posted: Mon Sep 8 04:32:11 1986 Date-Received: Tue, 9-Sep-86 06:22:29 EDT References: <1130@bu-cs.bu-cs.BU.EDU> <7144@lanl.ARPA> <7148@lanl.ARPA> Reply-To: aeh@h.cc.purdue.edu.UUCP (Dale Talcott) Organization: Purdue University Computing Center Lines: 89 Keywords: virtual memory, LRU, Cyber 205 Summary: some cost figures This is not directly related to the discussion about large main memories, but I thought I would try to supply some information about the CDC Cyber 205, since we have one of these beasties. The context is the comparison between the Cray, which does not use virtual memory, and the 205, which does. The hardware part of virtual memory on the 205 is implemented using a two level lookup table to translate virtual page addresses to physical pages. The first level is a 16 entry set of "associative registers" (ARs) which can do this mapping in 1 cycle. These are similar to the "translation lookaside buffer" on other virtual memory machines, except smaller. The second level is the "space table", which has an entry for each occupied real memory page (plus some dummy entries used by the system). This table resides at a fixed address in real memory and its first 16 entries are loaded into the associative registers. That is, the ARs cache the first 16 entries of the space table. When a memory reference cannot be mapped using the associative registers, the affected instruction is suspended, the ARs are stored into the start of the space table, the space table is searched for the mapping at a rate of two entries per cycle until the match is found, the match is moved to the first entry in the space table, the ARs are reloaded from the space table, and the instruction is resumed. (If a mapping is not found, this constitutes a page fault, which I am ignoring.) The ARs need to be stored and reloaded because they are kept in LRU order by the hardware, and thus rapidly get out of sync with the part of the space table they are caching. The overhead for stopping, saving, reloading, and restarting is about 80 cycles. (Choke!) When running in monitor mode, all memory references are physical references, but take just as long to execute. I suspect, but have not checked the prints to be sure, that monitor mode just forces the ARs to always respond with the identity map. With that as background, 1) Whatever programming technique you use on a Cray to fit ten pounds of data into 5 pounds of real memory will also work on the 205. That is, if your program can be run on a Cray with X Mwords of memory, it will run on a 205 with X Mwords of memory WITHOUT FAULTING. 2) Monitor mode on the 205 disables external interrupts, so I seriously doubt that any sites are running their 205's "with virtual memory disabled". 3) Nonetheless, I ran some timing tests to determine the cost of space table searches (which would not happen in monitor mode, due to the faked identity map). Using as a test case a vector add of two arrays into a third array, each array one third the size of available real memory, I came up with a cost of 0.48% with 4 Mwords and 1.5% for 32 Mwords (both assuming a page size of 64k words, which is the largest available on the 205). Over the course of a day, the latter amounts to about 22 minutes. The previous example has about the best AR hit rate for code which still references every word of real memory just once. If the code were a gather/scatter which touched only one word per page, the cost would be a whopping 32800% for a 32 Mword system. Fortunately, most codes have better locality of reference than either of these, so there is little incentive to run a 205 as a single user, real memory system. 4) Note that the previous test compares the 205 with itself. That is, from the results we cannot determine how much faster the 205 would run if it had no virtual memory at all, only how much faster it would run if we don't use what is there. The 205, like the Cray, is very parallel, so it is possible that the AR mapping takes place at the same time as other functions and ends up adding nothing to the total execution time of memory reference instructions which find a hit in the ARs. (Again, I haven't checked this against the prints or microcode.) 5) The 205's ancestor, the STAR-100, had at most 1 Mword of memory, and thus all real memory could be mapped with just the ARs (if you used only the large page size). It would have been nice if CDC could have kept this property as they increased the size of central memory. (My .signature is copied from someone else's [Hi Tom] so I can't guarantee I am really reachable where it says, nor even that it gets included. So far as I can tell, news is voodoo.) -- Dale Talcott Systems programmer ARPANET: aeh@j.cc.Purdue.EDU Purdue University Computing Center BITNET: AEH@PURCCVM Mathematical Sciences Bldg. USENET: aeh@pucc-j.UUCP West Lafayette, IN 47907 Phone: (317) 494-1787