Path: utzoo!attcan!uunet!husc6!bloom-beacon!gatech!hubcap!mark From: mark@hubcap.UUCP (Mark Smotherman) Newsgroups: comp.arch Subject: Re: Lisp "future" instruction in 88k hardware. Summary: historical note on Gamma-60 Message-ID: <3543@hubcap.UUCP> Date: 14 Nov 88 23:17:56 GMT References: <3401@geaclib.UUCP> <3410@geaclib.UUCP> <6410@june.cs.washington.edu> Organization: Clemson University, Clemson, SC Lines: 275 In article <6410@june.cs.washington.edu>, david@june.cs.washington.edu (David Callahan) writes: > In article <3410@geaclib.UUCP> daveb@geaclib.UUCP (David Collier-Brown) writes: > > > > The principle is "start it, and only wait for it if you have to", > > > future function,returnVector2 > > ... > > ... > > await r4,returnVector2 > > > > join r1 ! wait for incoming token > add r1 = r1 + r2 > fork r3 ! generante extra token > An early machine, the Gamma-60, implemented fork and join. A brief sketch of the machine follows: Bull Gamma 60 (1959) History Compagnie des Machines Bull of France announced the Gamma 60 in 1958. Twenty systems were built. The logical designers were Pierre Chanus, Jean Bosset, and J.P. Cottet (the ones credited in the Datamation article). The machine is composed of multiple processing units with exclusive processing differentiation (e.g. BCD ALU, binary ALU, comparison unit, code translation unit) and multiple I/O processing units (e.g. drum, tape, etc.), each of which has an identical instruction request and data request interface to a single supervisory control unit and to main memory. Each unit is capable of independent, concurrent operation once it is selected and initiated by the supervisory control unit. Highlights Noteworthy - Multiprocessing. - Hardware provision for asynchronous parallel processes (i.e. FORK and JOIN operations). - Synchronization by queueing for both hardware and software resources. - Processing unit status word. Pecularities - Separation of comparison and arithmetic operations into distinct processing units. - Multiple word, variable length instructions with storage of control information in instruction sequence. - Process scheduling (which is apparently FIFO) is wired into the supervisory control unit. Instruction Formats A complete instruction is composed of a series of 24-bit words. There may be 1-4 addresses depending on the processing unit. Word types are: A - address (includes opcode and 15-bit address field) B - branch (one appears to be SIMU - i.e. FORK), (15-bit address field) C - unit selection (misleadingly called an interrupt), (15-bit field used for unit number, another field contains number of calls required to this unit before processing can begin - i.e. used as JOIN) D - operation specification (15-bit field used for length) An address word can: load a DAR (data address register); change its value by indexing or indirection; or, cause it to be stored in memory. A unit selection word loads the PAR (program address word) of the unit with the next program word address (i.e. the address immediately following the address of the unit selection word). postulated example (complete instructions are labeled): A: branch SIMU to address D => FORK empty word used for queueing link word for SIMU? B: unit select arithmetic, 1 call to proceed empty word used for call count and queueing link word address location of operand 1 address location of operand 2 address location of result operation add C: branch unconditional to address E D: unit select drum empty word address source address address destination address operation transfer drum to memory, N number of words E: unit select translate, 2 calls to proceed => JOIN empty word address source address address destination address operation block move, N number of words to transfer thus the data flow is: A +----+----+ fork V V BC D +---+ +---+ join V V E Spaces and Addressing 24-bit word, 4-bit BCD digits, 6-bit characters Working and control storage The first 128 locations contain sets of registers for each unit. Each set has: PAR (program address register) that also contains the status of the unit (busy/available) unit queue head pointer unit queue tail pointer one or more DAR's (data address registers) depending on unit Memory name-space and hierarchy 32K words, first 128 locations reserved as special registers. Address calculation and addressing modes Provision for relative and indirect addressing, 4 index registers in aritmetic unit, autoindexing by incrementing or decrementing depending on operation. Address mapping None Data Formats Character - 6-bit code similar to ASCII Logical - 24-bit binary word Fixed-point apparently: 24-bit binary word (for machinehood - addresses and loop counts) BCD fixed-point - 10 digits with decimal point location specified +-+-------+--------------+-------------------+ |s|pt.loc.| number ... (2nd word) | +-+-------+--------------+-------------------+ # bits: 1/ 7 / 16 / 24 # bcd digits: 2 / 4 / 6 Floating-point BCD digits, similar to fixed point, exponent in point location field (0-79 with bias of 40), double precision possible with 11- 19 digits in four words (point location field of third word is apparently used as length). Operations Data handling - translate, edit, block transfer Logic Fixed-point - includes loop counts in binary Floating-point - 13 operations for arithmetic unit Sequencing Normal sequencing is provided by one PAR for each processing unit and a queue of waiting processes (i.e. addresses of instruction streams). The supervisory control unit advances the PAR's and moves the process (i.e. instruction address stream) to the queue of the appropriate processing unit whenever a unit selection word is encountered. The SIMU branch instruction causes the current program sequence to be queued on the supervisory control unit (in much the same manner as queueing is done for the processing units) for later execution. The supervisory control unit then starts the execution of a new program sequence at the branch address. This can occur several times, so that an indefinite number of parallel processes may be initiated. The supervisory control unit will perform unit selection (and if the unit is free, then the DAR loading/modification and command initiation) for each created process in turn. Decision - conditional branch on unit status Iteration - apparently some form of automatic loop counting Delegation Shared subroutines are treated as virtual units to provide mutual exclusion. A subroutine call then appears as a unit selection word to a virtual unit number, and queueing is used just as in the case of physical units. Supervision Control switching On errors, the currently executing program is queued (at front?), and a diagnosis program is automatically initiated. Integrity - no memory protection Process interaction, multiprocessing The empty word after each unit selection word is used to hold the queue links to the waiting unit selection words; the queue has FIFO handling (apparently the provision for waiting on a certain number of calls, i.e. unit selections, means that the queue must be completely searched for any previously queued calls at the same instruction address). Input/Output Implementation Notes Each processing unit (including each I/O device) has a control unit for interfacing to the busses for instruction requests, data requests, data distribution, and data collection. Each unit requests instructions and data over these special busses, which are priority arbitrated. Apparently, normal operation is: request an instruction over the inst.req bus, receive a command at some point on the inst.dist bus, request input data over the data.req bus, accept input data from the data.dist bus (note that the control unit registers have the addresses), perform the operation, request output data transfer, send output data over the data.collect bus, and then repeat the cycle. Comments As mentioned in section 7.4 of the text, the "granularity" of concurrent process specification in the Gamma 60 is too fine. Each instruction, rather than each logical unit of work, is defined to be a separate process that the machine must initiate and coordinate. This results in lots of overhead. The cause of this can be traced to the memory-oriented design, as opposed to a traditional CPU-oriented design; everything possible was done to use every memory cycle. Unfortunately, by attempting to meet this goal by using concurrent operation of multiple processing units at such a low level, it appears that many memory cycles were wasted due to the supervisory overhead of processes switching back and forth between units or went idle due to congestion at critical, bottleneck units. References "France's Gamma 60: A step forward in data processing?" Datamation 4, 3 (May-June 1958), 34-35. P. Dreyfus, "System design of the Gamma 60," Proc. WJCC 1958, LA, CA, pp. 130-133. P. Dreyfus, "Programming design features of the Gamma 60 computer," Proc. EJCC 1959, pp. 174-181. M. Bataille, "The Gamma 60: The computer that was ahead of its time," Honeywell Computer Journal 5, 3 (1971) 99-105. [This is the best reference, contains refs to French and German papers as well as a Univ. Mich. report by Dreyfus] -- Mark Smotherman, Comp. Sci. Dept., Clemson University, Clemson, SC 29634 INTERNET: mark@hubcap.clemson.edu UUCP: gatech!hubcap!mark