Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!auspex!guy
From: guy@auspex.auspex.com (Guy Harris)
Newsgroups: comp.arch
Subject: Re: Architecture questions
Message-ID: <4056@auspex.auspex.com>
Date: 12 Sep 90 23:17:44 GMT
References: <2531@l.cc.purdue.edu> <4043@auspex.auspex.com> <STEPHEN.90Sep10191125@estragon.uchicago.edu>
Organization: Auspex Systems, Santa Clara
Lines: 47

>   "Reading from a buffer" in what sense?  Is this just an in-memory buffer
>   being read by a transaction processing application - in which case I
>   don't see how an *interrupt* would help, as a dumb old conditional
>   branch testing whether the number of characters left in the buffer was
>   zero would probably be *faster* than an interrupt (no tons of context so
>   save, etc.), or is it something else?
>
>Er. You have an 8k buffer. Each branch takes 1 cycle to test, 1 cycle
>if not taken. There goes 16k cycles. You're telling me that servicing
>a trap is going to take 16k cycles?

OK, so what is the extra magic instruction here?  I assume it basically
amounts to a block move; if the basic "reading from the buffer"
operation is done with a loop, only terminated with a trap instread of a
conditional branch, the question is whether servicing the trap takes
longer than the conditional branch at loop termination (which may be an
untaken branch, depending on the code in the loop).

A quick look at e.g.  the 68020 book indicates that a byte-offset
conditional branch with the branch not taken takes as much time as a
conditional trap (with no argument) if the trap isn't taken, but if the
trap *is* taken it takes a fair bit longer.

I suspect the trap is unlikely to win if it's just a conditional trap
instruction.  Therefore, I assume the instruction is a block move.

So, if it's a block move, what's the point of the trap? Just have the
instruction do a conditional branch, which, again, will probably take
fewer cycles than the trap.

(We don't even count any cycles spent in the trap service routine in
either case.)

Thus, the interrupt doesn't help in either case.

The question then is whether the *rest* of the instruction helps.  If
you can run the copy loop at memory speed (we assume here that the block
move can run at memory speed) by building it out of ordinary boring
instructions, it probably won't help.  If you can e.g. stuff the actual
fetch-from-the-buffer instruction into the delay slot on a machine with
branch delays, you might be able to come close enough to the block move
instruction (or do as well) - and we haven't even considered unrolling
the loop here.

Of course, the other possibility might be to have the application use
"locate mode", and just use the data as it is in the buffer rather than
copying it out, if that's possible....