Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!usc!apple!baum From: baum@Apple.COM (Allen J. Baum) Newsgroups: comp.arch Subject: Re: Late, Lamented E&S-1 -- whats it look like? Message-ID: <36804@apple.Apple.COM> Date: 27 Nov 89 20:31:27 GMT References: <36652@apple.Apple.COM> <324@svcs1.UUCP> <36725@apple.Apple.COM> <329@svcs1.UUCP> Reply-To: baum@apple.UUCP (Allen Baum) Organization: Apple Computer, Inc. Lines: 49 [] >In article <329@svcs1.UUCP> andy@svcs1.UUCP (Andy Piziali) writes: >In article <36725@apple.Apple.COM> Allen Baum asked: > What are the architectural features that permitted this system to be used > effectively? > >In the case of the ES-1, the compiler has an intimate knowledge of the CU > pipeline Sorry, but an optimizing compiler that knows about the pipeline is not an architectural feature; it's a necessity (these days) for good performance. >On top of the necessary compiler technology, there must then be architectural >support for coordinating the multiple threads of control created by the >compiler. In the ES-1, there are three mechanisms for inter-thread synchroni- >zation: atomic memory accesses, signals, and interrupts. > >The signal mechanism is a means for threads to asynchronously communicate. A >hardware control block is constructed by the thread(A) specifying what signals >the thread is expecting. When another thread (B) sends thread A a signal, the >receipt of the signal is recorded in the control block & if the thread is not >currently active(running on a CU), a processor running a lower priority thread >is interrupted. > >The CUs in an ES-1 may send interrupts to one another for use in asynchronous >event signalling. OK, those are architectural features. >> What was the latency through the crossbar (ie. how many delay slots were >> there after a load?) >I feel more comfortable answering how load latency is hidden in general in the >ES-1 than citing specific machine parameters. The integer and floating point >registers are independently scoreboarded and are always non-blocking. There is >no fixed number of delay slots after loads. Instruction issue is not stalled >until a load destination register is specified as a subsequent source register Well, I understand that you may not be comfortable answering the question that I asked, but I asked for a reason. Crossbars, as nice as the are, exact a penalty in access latency. This penalty is sometimes great enough to cancel the benefit of having the crossbar in the first place. If the penalty is great enough, you may as well have a local-remote archtitecture. So.... what is the penalty? -- baum@apple.com (408)974-3385 {decwrl,hplabs}!amdahl!apple!baum Brought to you by Super Global Mega Corp .com