Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!linus!philabs!cmcl2!seismo!gatech!akgua!akguc!codas!peora!jer From: jer@peora.UUCP (J. Eric Roskos) Newsgroups: net.arch Subject: Re: How Many ... (really NYU Ultracomputer) Message-ID: <2141@peora.UUCP> Date: Mon, 5-May-86 09:17:51 EDT Article-I.D.: peora.2141 Posted: Mon May 5 09:17:51 1986 Date-Received: Thu, 8-May-86 07:20:14 EDT References: <2089@peora.UUCP> <5100058@ccvaxa> <2120@peora.UUCP> <5653@cmcl2.UUCP> Organization: Concurrent Computer Corporation, Orlando, Fl Lines: 67 Summary: oops... confused by mode of hypothetical case I wrote: >On the other hand, you can eliminate this problem by putting the translation >hardware out at the memory (which I believe is what was done by >Gottlieb et. al. in their supercomputer project, along with also putting >some adders and so on out there) ... Allan Gottlieb replied: > This is not quite right. ... The latency grows as log > #PE and it is not trivial to do enough prefetching to mask the > latency. For this reason it is essential to put the cache on the > PE side of the network. The memory management of our current > prototype (8PEs, bus-based) is on the PE side of the network ... we > consider it important to have the TLB on the PE side of the > network; the design "locks" TLB resident pages in the MMs. Oops... I apologise about that... in your paper you wrote: To prevent the cluster controller from becoming a bottleneck, each MM has a directory with entries for the pages it currently contains.... When a page fault is detected by an MM, an appropriate message is sent to the cluster controller, inducing it to perform a page swap and update the directory of each MM in its cluster. ... Performing address translation at the MM is not without cost: virtual addresses augmented with address space identifiers, which may be longer than physical addresses, must be transmitted across the network, increasing traffic. Also, the optimal cluster size for paging may be bad for general I/O. We are, therefore, investigating the second option of locating the translation mechanism at the PEs. Since the description of the hypothetical case was in the present tense, no subjunctives, I accidentally misunderstood the last sentence to mean "we attempted the former and are now investigating the possibility of using the latter approach for the next prototype." I apologise for any confusion that may have resulted from my misinterpretation. There is another consideration involved in the placement of the address translation hardware, however, aside from the above factors. That is the situation in which heterogeneous processors serve as the PEs. It is a problem not so commonly discussed in the research, but it is a problem in the real world. The problem is that with differing processor types, there may not be available the same MMU for all processors. This might be the case, for example, in a machine using both 8086 and 68000 CPUs sharing a common memory. Two problems come up: (1) the 8086s MMU and the 68000s MMU require substantially different translation/protection tables. Keeping them consistent is a problem. (2) Proving that the protection is correct is much harder. #2 is a problem even with homogeneous processors. It would be much easier, especially to do the proofs required by things such as CSC-STD-001-83 and similar existing security standards, if the memory was self-protecting: if, assuming that the integrity of the memory system was not violated, that it was not possible to make illegal accesses. Unfortunately, the practical considerations presently prevent this. ------ Disclaimer: the above comments reflect my own ideas, and do not necessarily reflect any work presently being done at Concurrent. -- E. Roskos