Path: utzoo!mnetor!uunet!lll-winken!lll-tis!ames!umd5!purdue!i.cc.purdue.edu!j.cc.purdue.edu!pur-ee!hankd
From: hankd@pur-ee.UUCP (Hank Dietz)
Newsgroups: comp.arch
Subject: Re: Wulf's WM
Message-ID: <7992@pur-ee.UUCP>
Date: 25 Apr 88 17:34:08 GMT
References: <28200131@urbsdc> <1508@pt.cs.cmu.edu> <50669@sun.uucp>
Organization: Purdue University Engineering Computer Network
Lines: 41
Summary: CSPI also uses FIFO memory interface

In article <50669@sun.uucp>, ram%shukra@Sun.COM (Renu Raman (Sun Microsystems)) writes:
> In article <1508@pt.cs.cmu.edu> agn@UNH.CS.CMU.EDU (Andreas Nowatzyk) writes:
> >Of particular appeal to me was the introduction of fifo's into the
> >load/store instructions (decoupling the time when an address is issued
> >from the time when the data is accessed) as it has the *potential*  of
> >allowing more latency in the memory system without degrading the throughput.
>    This is not new.  Read about the ZS-1 in the prevous ASPLOS conference

Actually, it has been common for a while now.  I don't remember the model,
but I know at least one of CSPI's array processors used FIFO interfaces not
only to decouple memory references, but also to decouple interactions
between control, address generation, and arithmetic hardware.  You are right
about the ZS-1... it looks very similar to WM.  Besides, for the past 6 or 7
years, at least a few hardware people I know have used "dataflow
microarchitecture" (i.e., FIFO-interconnected functional units) in building
conventional-looking special-purpose machines.

There is a catch, however, in that FIFOs to shared resources start showing
the usual dataflow problems: operand to operation matching and high bus
bandwidth requirements.  These are non-trivial problems to solve dynamically.
WM solves them by forcing operations of each type to be executed in the
original sequence, although different types of operations can execute in a
variable order relative to each other.  This is quite a strong restriction.

Personally, I'd rather see these problems solved by static scheduling at
compile time: a la VLIW (but not a VLIW machine).  Burton Smith has a design
which uses static information to control dynamic out-of-sequence evaluation
(he's been telling me about this for some years now, but he's building it,
not publishing on it), and I've been involved in a couple of designs with
similar properties (e.g., SBMs -- Static Barrier MIMDs -- papers/references
available upon request).  Statically scheduled, but not necessarily fixed
sequence, machines seem to have the benefit of very simple hardware
supporting a very general execution mechanism, and the compiler technology
to get good results is easy enough (for us compiler gurus :-).

     __         /|
  _ |  |  __   / |  Compiler-oriented
 /  |--| |  | |  |  Architecture
/   |  | |__| |_/   Researcher from
\__ |  | | \  |     Purdue
    \    |  \  \
	 \      \   hankd@ee.ecn.purdue.edu