Path: utzoo!attcan!uunet!wuarchive!cs.utexas.edu!rice!uw-beaver!zephyr.ens.tek.com!orca.wv.tek.com!leia!opus!johnt From: johnt@opus.WV.TEK.COM (John Theus;685-2564;61-183;625-6654;hammer) Newsgroups: comp.arch Subject: Re: VME Bus Standard (RISC bus suggestions) Message-ID: <277@leia.WV.TEK.COM> Date: 19 Dec 89 08:52:13 GMT References: <11759@phoenix.Princeton.EDU> <112400016@uxa.cso.uiuc.edu> Sender: johnt@leia.WV.TEK.COM Reply-To: johnt@opus.WV.TEK.COM (John Theus) Organization: Tektronix, Inc., Wilsonville, OR Lines: 107 In article <112400016@uxa.cso.uiuc.edu> afgg6490@uxa.cso.uiuc.edu writes: > >OK, let me make some suggestions for a RISC bus: > >(1) All transactions are disconnected or split. > Possibly an arbitration preemption line if the response is > immediately available. (IE. you don't assume connected > and then change over to split depending on ACK. You assume split. > Connected = split with immediate response separate). > >(2) Throw out all the fancy synchronization operations. > Provide (i) a LOCK signal that can be applied only to a single > resource of less than bus width. Let software protocols handle > multiple resource locking - don't require the bus interfaces > to track it. > If you feel adventurous, provide (ii) a remote load-store-fixed > or compare-and-swap, or (ii) a remote fetch-and-add. These > because they possibly permit combining. > Probably only provide one of them. I think this is a good example why designing a RISC bus is difficult. If you did item (1) and item (2)(i) you would have a flawed lock operation. A split transaction interconnect requires at least one of item (2)(ii) to have a true atomic operation. Which one? That's why both Futurebus+ and SCI (Scalable Coherent Interface) have these lock operations. Before I launch into a long lock discussion, I want to point out that making the basic decision to use split transactions adds several times to the complexity of the bus interface logic over a connected protocol. So even if the bus protocols are RISC, the interface implementation is very complex. For those of you that haven't thought about this first problem, let me try to explain locks in a split transaction environment. A split transaction consists of a request transaction e.g. a processor requests a read from memory (the requester), follow eventually by a response transaction from memory that returns the requested data (the responder). Only writes are performed on the bus since memory becomes a bus master as the responder. Several other transactions can occur on the bus between a request and its response. For the typical processor generated semaphore of a read followed by a write (swap, test-and-set, etc.), the processor might simply make a read request followed by a write request with its accompanying data. The responder would return the requested read data as the first response, and a write acknowledge as the second response. Besides being very inefficient, there is no guarantee the responder will not receive a request from another party between the two semaphore requests. If this occurred, the semaphore would not be atomic. The lock protocols might require the "bus" to prevent another request from being issued, or they might prevent the responder from acting on another request, but this would largely defeat the purpose of using split transactions, especially in a switch environment. Split response transactions require a different technique to lock the read and write operations. The solution is to perform the lock operation with a single request. Accompanying this request is a command for the responder to execute. For example, to perform a swap operation, the requester becomes master and addresses the responder with a swap lock transaction command and sends the data to be written. Then the master disconnects. When the responder acts upon the request, it executes the command by first reading the data addressed and stores it in a temporary buffer. The responder then writes to memory the data that was sent along with the request. The responder atomically executes the read and write memory operations. The buffered data is sent back to the requester in the form of a response transaction. The fetch and add command is executed by the requester sending the value to add in the request. The responder returns the original unmodified value to the requester, and then stores the sum in the addressed location. The compare and swap is executed by the requester sending the compare value and the swap value in the request. The responder returns the original unmodified value to the requester, and if the compare value is equal to the original value, it then stores the swap value in the addressed location. So, the next problem for the bus designer is deciding which lock operations to support. Most processors can generate a swap of some form, but the load occurs first followed by the store which is great for a connected bus, but backwards for a split environment. A lot of system designers would like a fetch-and-add since it is a more powerful operation than swap, but I far as I know, only Intel has produced a mainstream processor with this instruction. Right now I know of no processor that can directly generate a lock operation for a split transaction interconnect. Fetch-and-add allows combining in switch environments and allows the return of many (one per bit of data width) unique values in a single transaction on a bus with broadcast. Futurebus+ provides 3 bits for lock command encoding. Four codes are reserved for the future, and the others are nop, swap, compare-and-swap, and fetch-and-add. SCI provides 4 bits for lock command encoding. Eight codes are reserved for the future, and the 3 bits that make up the other 8 are used to directly control the hardware facilities that would be required if you implemented all of the above Futurebus+ commands. With this approach you can generate all the lock permutations that the hardware can could support. In my opinion this goes beyond CISC since most of the lock operations SCI implements no one knows how to use. On your split transaction RISC bus, how would you do locks? John Theus johnt@opus.wv.tek.com Futurebus+ Parallel Protocol Coordinator Tektronix, Inc. Interactive Technologies Div. - shipping the Futurebus-based XD88 workstations