Path: utzoo!attcan!uunet!lll-winken!lll-tis!ames!mailrus!tut.cis.ohio-state.edu!ukma!gatech!hubcap!intelisc!joel From: intelisc!joel@uunet.UU.NET (Joel Clark) Newsgroups: comp.parallel Subject: Wormhole Routing (long 90+ lines) Keywords: iPSC/2 routing Message-ID: <3404@hubcap.UUCP> Date: 1 Nov 88 17:58:13 GMT Sender: fpst@hubcap.UUCP Lines: 97 Approved: parallel@hubcap.clemson.edu Wormhole Routing on the iPSC/2 extracts from "The iPSC/2 Direct-Connect Communications Technology" by Steven F. Nugent as reprinted in "A Technical summary of the iPSC/2 Concurrent Supercomputer" ( This publication is available from Intel Scientific Computers Sales literature 15201 S.W. Greenbrier Parkway, Beaverton, Or 97006 (503) 629-7600. Steven's article and others also appear in the "Proceedings of the Third Hypercube Conference", copyright ACM 1988) The Direct-Connect Module router is a hardware controlled message passing system. The DCM routers form a circuit-switch network that dynamically create a synchronous path, from a source to a destination node, which remains open for the duration of a message. The DCM router supports connections for eight full duplex, bit-serial channels that connect nearest neighbor nodes in a hypercube, and can be interconnected to form networks of up to seven dimensions containing 128 nodes. Each of the eight channels is routed independently allowing up to eight messages to be routed simultaneously. Routing is based on the e-cube routing algorithym (references 1,2) which guarantees a deadlock free network. Paths are dynamically constructed for each message prior to its transmission. A complete path is built in a step-by-step process involving arbitrations for additional path segments at each router. The channels that constitute a path are held for the duration of the message. When a destination node is ready to accept a message, transmission begins. A channel is released when the tail of a message passes between the routers connected by that channel. Direct-Connect routing is a variation of Wormhole routing (references 3,4) with the primary difference being that with the Direct-Connect the message is transmitted after the route has been built. This eliminates the need for flow-control buffering in the intermediate routers. A routing operation can be broken into four phases: establishing a path, acknowledgement, message transmission and releasing connections. To initiate the routing of a message, the source node must transfer at least one 32 bit word to its DCM. The low order 8 bits of this word are a channel routing mask calculated by taking the XOR of the source and destination node's binary addresses. The DCM serializer establishes the first segment of the path via the channel corresponding the the lowest order bit set in the channel routing mask. In order to avoid deadlock, messages in the hypercube are routed in increasingly higher numbered channels until the destination is reached. The local DCM arbitrates requests for the same channel and grants them one at a time in a "round robin" fashion. When the channel is granted, the channel routing mask is forwarded via that channel, to the next DCM. The new DCM then scans the channel routing mask from the arriving channel's bit upward, for the next channel to request. When that channel is available the routing request (channel routing mask) is forwarded, and so on, until it reaches its destination node, establishing the "message path" from source to destination node.. If the destination DCM can accept a message it will acknowledge with the RDY signal establishing a "status path", for carrying flow-control information, matching, in the opposite direction, the message path. The status path, like the message path, maintains its connection for the duration of the message. Once the RDY signal is received at the source node, message transmission begins. The source DCM can transmit data continously into the network until the End of Message is reached or a not ready indication is received over the status path. When the RDY is again detected on the status path message transmission resumes. When a DCM sees an End of Message it frees the channel for other requests. There is no buffering of messages or need of CPU cycles on the intermediate nodes. On source and destination nodes, a DMA transfers the message in 32 bit words directly from the DCM to memory. References: (1) C. R. Lang jr., "The Extension of Object-Oriented Languages to a Homogeneous, Concurrent Architecture", Dept. of Computer Science, Calif. Inst. of Technology, Tech. Reprot 5014, May 1982 (2) H. Sullivan & T. R. Bashkow, "A Large Scale Homogeneous Machine", Proc. 4th Annual Symposium on Computer Architecture, 1977, pp 105-124. (3) C. L. Seitz, et al., "The Hybercube Communications Chip", Dept of Comp. Sci., Calif. Inst. of Technology, Display File 5182 March 1985 (4) W. J. Dally, "A VLSI Architectur for Concurrent Data Structures", Ph.D. Thesis, Dept. of Comp. Sci., Calif. Inst. of Technology, Technical Report 5209, March 1986 Joel Clark joel@intelisc.intel.COM Intel Scientific Computers {tektronix}!ogcvax!intelisc!joel Beaverton, OR (503) 629-7732