Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!usc!samsung!think.com!snorkelwacker.mit.edu!mintaka!spdcc!iecc!Postmaster From: johnl@iecc.cambridge.ma.us (John R. Levine) Newsgroups: comp.arch Subject: Re: Can old architectures run fast? Message-ID: <9105070005.AA24446@iecc.cambridge.ma.us> Date: 7 May 91 04:05:05 GMT References: <8283@uceng.UC.EDU> <7628@auspex.auspex.com> <8324@uceng.UC.EDU> <1991May05.174756.9026@iecc.cambridge.ma.us> Sender: Postmaster@iecc.cambridge.ma.us Organization: I.E.C.C. Lines: 37 In-Reply-To: <8346@uceng.UC.EDU> In article <8346@uceng.UC.EDU> you write: >How does a 3090 stack up against modern workstations on the usual >measures of performance/price, such as SPECmarks/$? My guess would >be that the large backwards compatibility comes at a price. It's hard to compare, since the 3090 is a mainframe, not a workstation, which means that it has I/O bandwidth orders of magnitude better than anything you'd see on or next to a desk. High end 3090 installations run on-line systems which handle 1000 transactions/second (that's per second, not per hour) and there's nothing anywhere near comparable in workstation-land. A bunch of micros sharing data over a network turns out not to do the trick, because you end up with intolerable hot spots in the data base. >Also, how much slower and/or more expensive is the 3090 as a result >of maintaining such backwards compatibility? (I realize that might be >hard to get a handle on.) No question. it's not cheap. Some of the stuff they have to do is extremely gross. The worst example is an execute instruction which points to a translate-and-test (TRT) instruction. The TRT has two memory operands and looks up each byte of the first using the second as the lookup table until it finds a table entry that's non-zero. This means that the length of the first operand depends on its contents. 360 instructions are not continuable, so since the execute, the TRT, and both operands can each potentially span a page boundary, the CPU can need to touch as many as 8 pages. To tell whether it needs the 8th page it does a "trial execution" of the instruction that doesn't store any results before actually doing the instruction. There's even more internal hair than that, since the 3090 has lots of fault-tolerance hardware and takes microcode checkpoints several places in a complex instruction. That's the worst, a more typical instruction is "add" which computes an address by adding together one or two registers and a 12-bit offset in the instruction, picking up the word at that address, and adding it to a target register. Other than the three-input adder for address generation, that's pretty straightforward.