Path: utzoo!attcan!uunet!husc6!bloom-beacon!gatech!hubcap!mark
From: mark@hubcap.UUCP (Mark Smotherman)
Newsgroups: comp.arch
Subject: Re: Lisp "future" instruction in 88k hardware.
Summary: historical note on Gamma-60
Message-ID: <3543@hubcap.UUCP>
Date: 14 Nov 88 23:17:56 GMT
References: <3401@geaclib.UUCP> <3410@geaclib.UUCP> <6410@june.cs.washington.edu>
Organization: Clemson University, Clemson, SC
Lines: 275

In article <6410@june.cs.washington.edu>, david@june.cs.washington.edu (David Callahan) writes:
> In article <3410@geaclib.UUCP> daveb@geaclib.UUCP (David Collier-Brown) writes:
> >
> >  The principle is "start it, and only wait for it if you have to",
> 
> >	future function,returnVector2
> >	...
> >	...
> >	await r4,returnVector2
> >
> 
> 	join  r1		! wait for incoming token
> 	add r1 =  r1 + r2 
> 	fork r3			! generante extra token
> 

An early machine, the Gamma-60, implemented fork and join.  A brief
sketch of the machine follows:


Bull Gamma 60 (1959)


History

Compagnie des Machines Bull of France announced the Gamma 60 in
1958.  Twenty systems were built.  The logical designers were
Pierre Chanus, Jean Bosset, and J.P. Cottet (the ones credited in
the Datamation article).

The machine is composed of multiple processing units with
exclusive processing differentiation (e.g. BCD ALU, binary ALU,
comparison unit, code translation unit) and multiple I/O
processing units (e.g. drum, tape, etc.), each of which has an
identical instruction request and data request interface to a
single supervisory control unit and to main memory.  Each unit is
capable of independent, concurrent operation once it is selected
and initiated by the supervisory control unit.  


Highlights

Noteworthy
 -   Multiprocessing.
 -   Hardware provision for asynchronous parallel processes (i.e.
     FORK and JOIN operations).
 -   Synchronization by queueing for both hardware and software
     resources.
 -   Processing unit status word.

Pecularities
 -   Separation of comparison and arithmetic operations into
     distinct processing units.
 -   Multiple word, variable length instructions with storage of
     control information in instruction sequence.
 -   Process scheduling (which is apparently FIFO) is wired into
     the supervisory control unit.


Instruction Formats

A complete instruction is composed of a series of 24-bit words. 
There may be 1-4 addresses depending on the processing unit.  

Word types are:
     A - address (includes opcode and 15-bit address field)
     B - branch (one appears to be SIMU - i.e. FORK), (15-bit
          address field)
     C - unit selection (misleadingly called an interrupt),
          (15-bit field used for unit number, another field
          contains number of calls required to this unit before
          processing can begin - i.e. used as JOIN)
     D - operation specification (15-bit field used for length)
An address word can: load a DAR (data address register); change
its value by indexing or indirection; or, cause it to be stored
in memory.  A unit selection word loads the PAR (program address
word) of the unit with the next program word address (i.e. the
address immediately following the address of the unit selection
word).

postulated example (complete instructions are labeled):

 A:  branch         SIMU to address D => FORK
     empty word     used for queueing link word for SIMU?

 B:  unit select    arithmetic, 1 call to proceed
     empty word     used for call count and queueing link word
     address        location of operand 1
     address        location of operand 2
     address        location of result
     operation      add

 C:  branch         unconditional to address E

 D:  unit select    drum
     empty word
     address        source address
     address        destination address
     operation      transfer drum to memory, N number of words

 E:  unit select    translate, 2 calls to proceed => JOIN
     empty word
     address        source address
     address        destination address
     operation      block move, N number of words to transfer

          thus the data flow is:        A
                                   +----+----+ fork
                                   V         V
                                   BC        D
                                   +---+ +---+ join
                                       V V
                                        E


Spaces and Addressing

24-bit word, 4-bit BCD digits, 6-bit characters

Working and control storage
The first 128 locations contain sets of registers for each unit.
Each set has:
     PAR (program address register) that also contains the status
          of the unit (busy/available)
     unit queue head pointer
     unit queue tail pointer
     one or more DAR's (data address registers) depending on unit

Memory name-space and hierarchy
32K words, first 128 locations reserved as special registers.

Address calculation and addressing modes
Provision for relative and indirect addressing, 4 index registers
in aritmetic unit, autoindexing by incrementing or decrementing
depending on operation.

Address mapping
None


Data Formats

Character - 6-bit code similar to ASCII

Logical - 24-bit binary word

Fixed-point
apparently: 24-bit binary word (for machinehood - addresses and
loop counts)

BCD fixed-point - 10 digits with decimal point location specified
               +-+-------+--------------+-------------------+
               |s|pt.loc.|   number    ...   (2nd word)     |
               +-+-------+--------------+-------------------+
  # bits:       1/  7    /         16   /         24
  # bcd digits:     2    /         4    /         6

Floating-point
BCD digits, similar to fixed point, exponent in point location
field (0-79 with bias of 40), double precision possible with 11-
19 digits in four words (point location field of third word is
apparently used as length).


Operations

Data handling - translate, edit, block transfer
Logic
Fixed-point - includes loop counts in binary
Floating-point - 13 operations for arithmetic unit


Sequencing

Normal sequencing is provided by one PAR for each processing unit
and a queue of waiting processes (i.e. addresses of instruction
streams).  The supervisory control unit advances the PAR's and
moves the process (i.e. instruction address stream) to the queue
of the appropriate processing unit whenever a unit selection word
is encountered.

The SIMU branch instruction causes the current program sequence
to be queued on the supervisory control unit (in much the same
manner as queueing is done for the processing units) for later
execution.  The supervisory control unit then starts the
execution of a new program sequence at the branch address.  This
can occur several times, so that an indefinite number of parallel
processes may be initiated.  The supervisory control unit will
perform unit selection (and if the unit is free, then the DAR
loading/modification and command initiation) for each created
process in turn.

Decision - conditional branch on unit status

Iteration - apparently some form of automatic loop counting

Delegation
Shared subroutines are treated as virtual units to provide mutual
exclusion.  A subroutine call then appears as a unit selection
word to a virtual unit number, and queueing is used just as in
the case of physical units. 


Supervision

Control switching
On errors, the currently executing program is queued (at front?),
and a diagnosis program is automatically initiated.

Integrity - no memory protection

Process interaction, multiprocessing
The empty word after each unit selection word is used to hold the
queue links to the waiting unit selection words; the queue has
FIFO handling (apparently the provision for waiting on a certain
number of calls, i.e. unit selections, means that the queue must
be completely searched for any previously queued calls at the
same instruction address).


Input/Output


Implementation Notes

Each processing unit (including each I/O device) has a control
unit for interfacing to the busses for instruction requests, data
requests, data distribution, and data collection.  Each unit
requests instructions and data over these special busses, which
are priority arbitrated.  Apparently, normal operation is:
request an instruction over the inst.req bus, receive a command
at some point on the inst.dist bus, request input data over the
data.req bus, accept input data from the data.dist bus (note that
the control unit registers have the addresses), perform the
operation, request output data transfer, send output data over
the data.collect bus, and then repeat the cycle.


Comments

As mentioned in section 7.4 of the text, the "granularity" of
concurrent process specification in the Gamma 60 is too fine. 
Each instruction, rather than each logical unit of work, is
defined to be a separate process that the machine must initiate
and coordinate.  This results in lots of overhead.  The cause of
this can be traced to the memory-oriented design, as opposed to a
traditional CPU-oriented design; everything possible was done to
use every memory cycle.  Unfortunately, by attempting to meet
this goal by using concurrent operation of multiple processing
units at such a low level, it appears that many memory cycles
were wasted due to the supervisory overhead of processes
switching back and forth between units or went idle due to
congestion at critical, bottleneck units.


References

"France's Gamma 60: A step forward in data processing?"
Datamation 4, 3 (May-June 1958), 34-35.

P. Dreyfus, "System design of the Gamma 60," Proc. WJCC 1958, LA,
CA, pp. 130-133.

P. Dreyfus, "Programming design features of the Gamma 60
computer," Proc. EJCC 1959, pp. 174-181.

M. Bataille, "The Gamma 60: The computer that was ahead of its
time," Honeywell Computer Journal 5, 3 (1971) 99-105.
     [This is the best reference, contains refs to French and
     German papers as well as a Univ. Mich. report by Dreyfus]
-- 
Mark Smotherman, Comp. Sci. Dept., Clemson University, Clemson, SC 29634
INTERNET: mark@hubcap.clemson.edu    UUCP: gatech!hubcap!mark