Path: utzoo!attcan!uunet!zaphod.mps.ohio-state.edu!usc!ucla-cs!fiji.cs.ucla.edu!marc
From: marc@fiji.cs.ucla.edu (Marc Tremblay)
Newsgroups: comp.arch
Subject: Re: load delays on SPARC
Message-ID: <1990Nov13.012526.3415@cs.ucla.edu>
Date: 13 Nov 90 01:25:26 GMT
References: <1990Nov12.232129.21399@Neon.Stanford.EDU>
Sender: news@cs.ucla.edu (Mr. News)
Organization: UCLA Computer Science Department
Lines: 32
Nntp-Posting-Host: fiji.cs.ucla.edu

In article <1990Nov12.232129.21399@Neon.Stanford.EDU> hoelzle@Neon.Stanford.EDU (Urs Hoelzle) writes:
>OK, here's something I've been wondering about for a long time: on the
>SPARC, loads take *always* at least 2 cycles, and stores take 3 (in
>all the Suns I know of).  Even if the load's result isn't used by the
>next instruction.  Why???

Remember that SPARC offers the flexibility of base register
plus index register memory addressing. This means that
at one point in the pipeline *two* registers must be accessed
just to generate the address.
The register that contains the data must also be accessed
which means that a store requires *three* register accesses.
The register file of early implementations of the SPARC architecture
had only two read ports (per cycle). The address can be generated
during one cycle and then the register containing the data can be
accessed in the following cycle. The third cycle consists of
actually storing the data off-chip (no on-chip cache),
hence the long latency.

For loads, early implementations of the SPARC (once again)
had multiplexed address and data busses between instruction and data.
This means that while the address of the load goes out and
when the data comes back no instruction can be fetched.
This results in a extra cycle of latency.

These implementations "features" can be overcome by dedicating
a bit more silicon and more pins to the processor chip.

_________________________________________________
Marc Tremblay
internet: marc@CS.UCLA.EDU
UUCP: ...!{uunet,ucbvax,rutgers}!cs.ucla.edu!marc