Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!auspex!guy From: guy@auspex.auspex.com (Guy Harris) Newsgroups: comp.arch Subject: Re: Architecture questions Message-ID: <4056@auspex.auspex.com> Date: 12 Sep 90 23:17:44 GMT References: <2531@l.cc.purdue.edu> <4043@auspex.auspex.com> Organization: Auspex Systems, Santa Clara Lines: 47 > "Reading from a buffer" in what sense? Is this just an in-memory buffer > being read by a transaction processing application - in which case I > don't see how an *interrupt* would help, as a dumb old conditional > branch testing whether the number of characters left in the buffer was > zero would probably be *faster* than an interrupt (no tons of context so > save, etc.), or is it something else? > >Er. You have an 8k buffer. Each branch takes 1 cycle to test, 1 cycle >if not taken. There goes 16k cycles. You're telling me that servicing >a trap is going to take 16k cycles? OK, so what is the extra magic instruction here? I assume it basically amounts to a block move; if the basic "reading from the buffer" operation is done with a loop, only terminated with a trap instread of a conditional branch, the question is whether servicing the trap takes longer than the conditional branch at loop termination (which may be an untaken branch, depending on the code in the loop). A quick look at e.g. the 68020 book indicates that a byte-offset conditional branch with the branch not taken takes as much time as a conditional trap (with no argument) if the trap isn't taken, but if the trap *is* taken it takes a fair bit longer. I suspect the trap is unlikely to win if it's just a conditional trap instruction. Therefore, I assume the instruction is a block move. So, if it's a block move, what's the point of the trap? Just have the instruction do a conditional branch, which, again, will probably take fewer cycles than the trap. (We don't even count any cycles spent in the trap service routine in either case.) Thus, the interrupt doesn't help in either case. The question then is whether the *rest* of the instruction helps. If you can run the copy loop at memory speed (we assume here that the block move can run at memory speed) by building it out of ordinary boring instructions, it probably won't help. If you can e.g. stuff the actual fetch-from-the-buffer instruction into the delay slot on a machine with branch delays, you might be able to come close enough to the block move instruction (or do as well) - and we haven't even considered unrolling the loop here. Of course, the other possibility might be to have the application use "locate mode", and just use the data as it is in the buffer rather than copying it out, if that's possible....