Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ames!vsi1!wyse!mips!earl@wright.mips.com
From: earl@wright.mips.com (Earl Killian)
Newsgroups: comp.arch
Subject: Re: [HS]W interlocks (was: Fujitsu SPARC Interlocks)
Message-ID: <13273@wright.mips.COM>
Date: 14 Feb 89 23:17:20 GMT
References: <28200269@mcdurb> <28200273@mcdurb> <3007@ardent.UUCP> <14619@cup.portal.com> <24435@amdcad.AMD.COM>
Sender: earl@mips.COM
Reply-To: earl@wright.mips.com (Earl Killian)
Organization: MIPS Computer Systems, Sunnyvale CA
Lines: 36
In-reply-to: tim@crackle.amd.com (Tim Olson)

In article <24435@amdcad.AMD.COM>, tim@crackle (Tim Olson) writes:
>If an on-chip I-cache can be built that will supply an instruction in a
>single-cycle (which it *has* to, in order to run at 1 inst/cycle), why
>can't a D-cache with the same characteristics exist?  If there is a load
>in the execute stage, then TLB translation can occur in parallel with
>D-cache lookup, resulting in a value that can be forwarded to the ALU
>for use in the very next instruction.
>
>A single delay slot, with good scheduling, still causes about a 5% to 6%
>pipeline stall (or equivalent nop execution) which could be reduced with
>a fast on-chip D-cache. 

You can easily build a data cache with the same latency as your
instruction cache.  But you need to provide an address to that data
cache, and it is the latency of the address formation + access that
creates the 1-cycle minimum delay that John Hennessy referred to.

Your statement is really only true in the context of the 29000 and
similar machines, which have no address add stage (addresses are simply
the contents of a register), and not for the MIPS instruction set, where
the address is formed from a base register plus a signed 16-bit
displacement.  This "feature" of the 29000 is unusual, and I think it is
mistake.  You certainly can't use the fact it is possible to implement a
delayless 29000 load to justify putting load interlocks into the MIPS
architecture!

I think Slater's question should have been "Will there ever be MIPS
instruction set implementations that have no delay slot?" instead of
"Will there be pipelined uPs that have no delay slot?" because the
higher-level question was "What does MIPS lose by having a load delay
slot instead of a load interlock?".  I agree with Hennessy that the load
delay slot will never cost MIPSco performance, except for a small
increase in the I-cache miss rate.
--
UUCP: {ames,decwrl,prls,pyramid}!mips!earl
USPS: MIPS Computer Systems, 930 Arques Ave, Sunnyvale CA, 94086