Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.1 6/24/83; site decwrl.UUCP Path: utzoo!watmath!clyde!burl!ulysses!mhuxl!houxm!vax135!cornell!uw-beaver!tektronix!hplabs!hpda!fortune!amd!decwrl!dec-rhea!dec-erlang!falcone From: falcone@erlang.DEC (Joe Falcone, HLO2-3/N03, dtn 225-6059) Newsgroups: net.arch Subject: Re: 68020 Performance Message-ID: <3560@decwrl.UUCP> Date: Sat, 15-Sep-84 01:25:50 EDT Article-I.D.: decwrl.3560 Posted: Sat Sep 15 01:25:50 1984 Date-Received: Fri, 14-Sep-84 06:02:15 EDT Sender: daemon@decwrl.UUCP Organization: DEC Engineering Network Lines: 97 CC: When is 16Mhz, not 16Mhz? It is difficult to discuss performance of the 68020 chip "out of context", i.e., without information about the rest of the system components. Because of its potentially very great speed (16.67 Mhz), the 68020 places significant demands on the memory and I/O portions of a system. One can assume that the 68020 instruction buffer and perhaps a cache designed into the system can reduce the demands on the memory system, however this reduction is highly dependent on the workload and the design of the cache and memory. No two system designs incorporating the 68020 chips are likely to be the same. The memory demands have worried me ever since a processor called the HP 9000 claimed to be able to run at 18Mhz with a relatively memory intensive stack architecture and no cache. The HP 9000 got around the problems by decoding and presenting addresses extremely early in instruction execution to a heavily pipelined memory controller which sure enough could deliver a word every 110ns (every other processor cycle). The memories were specially developed 128k nmos rams which ran fast and hot. Although the scheme worked, it was a technological "house of cards" since every piece was critically dependent on the performance of other components. Although it was claimed that one could run the HP 9000 processor chips at speeds over 30Mhz in the lab, the operation of these chips at that speed would have necessitated a faster bus, memory controller, and rams. So even though the cpu chip remains unchanged, the rest of the system has a fit. Now the problem with the 68020 is that it simply does not present addresses soon enough to the memory. Therefore, to run with no wait states, one must be able to provide the requested data within the few cycles allotted. In many current 68000 systems, the cycles allotted are not sufficient to avoid wait states because of delays from the memory management unit or slow rams. Therein lies a dilemma, for all of us want memory management of some form, and the cheaper, denser rams tend to be a little slower until technology catches up. My SR-50 tells me that a 16.67MHz clock gives a 60ns processor cycle. Assuming 4 cycles per access, that gives you 240ns for the round trip (address out, data in). A 200 nanosecond multi-megabyte main memory and management unit would be prohibitively expensive to integrate into a moderately priced 68020 workstation (although it is possible to do it at a definite big price). Hence, the crying need for a cache to assist data accesses and perhaps supplement the instruction buffer. Now, unless someone designs a cache which gives 100% data and instruction hit rates, that 16.67Mhz clock will be degraded by misses, and unless the memory subsystems are especially fast, there would be multi-cycle waits. Just off the top of my head, using a 90% hit rate cache, one might be looking at performance degradation anywhere between 10 and 25% due to miss penalties (it all depends on how big your cache is, how fast your main memory is, and how quickly your cache cycles the main memory to get what you need). So the 16.67Mhz clock has fallen as low as 12Mhz. Sometimes I think the FTC should get involved with this stuff. As a final note, the following is my own opinion as an educated individual. Isn't it kind of ridiculous to compare a cpu chip set to a very large computer system with caches and high speed I/O buses. I'm sure there are a lot of cpus out there that can beat a VAX-11/780 one-on-one running some benchmark. On the other hand, how many of them can handle the cpu, virtual memory, and I/O demands of 20 to 40 users? The fact is that there is a lot of stuff in the 780 (special instructions, cache, SBI, massbus, unibus, etc) just for handling lots of users for long periods of time. Unfortunately, this stuff does tend to get in the way of tests of raw, single-user, processor performance (either directly or indirectly because of compromises in the design process). The 780 is not trying to pretend to be a single-user workstation. MORAL: If you want to compare the 68020 (or any microprocessor) to a VAX, wait for the figures on the forthcoming VAX chip sets (which were discussed at some of the chip conferences). At the system level, the microVAX line of small systems is designed and packaged more for one-on-one personal use. The microVAX and VAX chip sets offer some very interesting comparison opportunities for "fair fights" between Digital and the competition. In the meantime, as an exercise, one might want to examine the performance of the grand old pdp-11/70 vs. the J-11 (11/73) chip set, and both of them relative to other 16-bit processors. With its memory management and floating point support, the 11/73 performs very well as a few user Qbus system. But the 11/70 is clearly still the choice for large systems because of its cache, special memory architecture, and unibus/massbus I/O. Although the machines have similar performance, you just would not want to put 20 users on an 11/73 - it doesn't have all the extras to take care of all those people. So the next time you want to compare microprocessors, go pick on someone your own size. Then you will have a more valid comparison. Joe Falcone Eastern Research Laboratory decvax! Digital Equipment Corporation decwrl!deccra!jrf Hudson, Mass tardis!