Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ames!xanth!mcnc!ece-csc!ncrcae!hubcap!mark
From: mark@hubcap.clemson.edu (Mark Smotherman)
Newsgroups: comp.arch
Subject: Re: MicroVAX emulation (really : DEC about-face)
Summary: Clark & Strecker article, INDEX inst. example, HPS impl. of VAX
Keywords: RISC, CISC, HPS
Message-ID: <5064@hubcap.clemson.edu>
Date: 11 Apr 89 14:43:10 GMT
References: <807@microsoft.UUCP> <92634@sun.uucp> <13322@steinmetz.ge.com> <573@loligo.cc.fsu.edu>
Organization: Clemson University, Clemson, SC
Lines: 44

In article <573@loligo.cc.fsu.edu>, bauer@loligo.cc.fsu.edu (Jeff Bauer) writes:
> Boy, all things do come around again...and again.
> I have a copy of a paper from grad school days by Clark and Strecker of DEC

   Douglas Clark and William Strecker, "Comments on 'The Case for the
   Reduced Instruction Set Computer,' by Patterson and Ditzel," Computer
   Architecture News, vol. 8, no. 6, October 15, 1980, pp. 34-38.

I've always wondered why they seem to take a swipe at their own designers when,
in discussing why the INDEX function was faster on the 780 if implemented as a
sequence of simple instructions, they say:

  "Anecdotal accounts of irrational implementations are certainly
                         ^^^^^^^^^^ (my emphasis)
   interesting.  Is it *typical*, however, that composite instructions
   run more slowly than equivalent sequences of simple instructions?
   The paper reports that a sequence of several simple instructions
   can replace the VAX INDEX instruction with a 45% speed gain on
   the 780.  This is a problem of implementation, not architecture.
   Fundamentally, after all, the implementation of the INDEX
   *function* with more than one instruction simply cannot take less
   time than the one-instruction version, assuming equal hardware in
   both cases.  The explanation of this anomaly is that the 780's
   Floating Point Accelerator speeds up the multiply in the
   multi-instruction implementation, but doesn't see the INDEX at all."

This is interesting to reread after the series of email articles discussing
how hard it is to pipeline the VAX architecture.  I've heard that the
real win on VAX implementations is to put in a heavy-duty microcode pipe.

Also, does anyone know if DEC is working on an HPS (i.e. a.k.a. micro-
dataflow, restricted dataflow, decoupled VLIW) version of the VAX?  Yale
Patt reported work on this in the 1986 Microprogramming conference.

   Yale Patt, *et al.*, "Run-Time Generation of HPS Microinstructions
   from a VAX Instruction Stream," in Proc. MICRO 19, New York, Oct. 1986,
   pp. 75-81.

   (and I think a paper in MICRO-20 also)

Has DEC followed up this work?
-- 
Mark Smotherman, Comp. Sci. Dept., Clemson University, Clemson, SC 29634
INTERNET: mark@hubcap.clemson.edu    UUCP: gatech!hubcap!mark