Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!uunet!seismo!rutgers!ames!pioneer!lamaster From: lamaster@pioneer.arpa (Hugh LaMaster) Newsgroups: comp.arch Subject: Re: What with these Vector's anyways? Message-ID: <2425@ames.arpa> Date: Fri, 31-Jul-87 13:30:49 EDT Article-I.D.: ames.2425 Posted: Fri Jul 31 13:30:49 1987 Date-Received: Sun, 2-Aug-87 01:07:47 EDT References: <218@astra.necisa.oz> <142700010@tiger.UUCP> Sender: usenet@ames.arpa Reply-To: lamaster@ames.UUCP (Hugh LaMaster) Distribution: world Organization: NASA Ames Research Center, Moffett Field, Calif. Lines: 95 Keywords: scalar vs. vectors, benchmarks, Dhrystone, sorting In article <279@diab.UUCP> pf@.UUCP (Per Fogelstrom) writes: >In article <10956@amdahl.amdahl.com> littauer@amdahl.UUCP (Tom Littauer) writes: >>In article <3636@well.UUCP> rchrd@well.UUCP (Richard Friedman) writes: >>>The best supercomputers are fast scalar machines first, with vector (omitted discussion about scalar perf. in supercomputers) > >Be serious! Do you really belive in Dhrystone? Okay i do admitt that we don't >have anything better for the moment, but soon i hope. (omitted discussion about Dhrystone) > >[ There are three types of lie, lie, damned lie, and BENCHMARKS !! ] >-- >Per Fogelstrom, Diab Data AB I have to add something else to this discussion. Ten years ago, when Crays first came out, IBM was still trying to peddle the 370/168 and Amdahl had its first faster machines. Folks at the national labs started saying that the Cray was not only fastest, but even also most cost effective, for "scalar" work. They were right, at the time. A lot of water has gone under the bridge since then. There wasn't much of a market for fast machines in the early and mid 70's, but the last four or five years have changed all that. Even IBM is trying to keep up. But, to get back to the question of scalar performance: Suppose you want to buy the most cost effective machine for doing large sorts. Ten years ago, that might have been a Cray. Parallel Computing (Vol 4 1987 pp 49-61) recently had a comparison of sorting performance using scalar and vector algorithms on big iron. The scalar performance of some non-Cray big machines has now caught up with Cray scalar performance (scalar Quicksort being a good example of a scalar code). Vector processors are being incorporated in more "mainstream" mainframes (Amdahl 1200 examined in the article, but also the IBM 3090 VF machines). And there are now vectorized sorting algorthms which can provide significant benefits for some cases. Overall, for sorting the Amdahl 1200 appeared to have the advantage for scalar and vector sorting over the Cray X-MP. There are several points here. The first is that as more companies are building fast machines and vector architectures have become mainstream, the members of the set "supercomputers" are a bit harder to define (again) than they were ten years ago. Even for traditional "business" "scalar" computing like sorting, there are now vector algorithms which show significant performance improvements over scalar algorithms. Finally, the question of what makes a good benchmark: If you want to do a lot of sorting, sorting makes a good benchmark. (Extrapolate to whatever you want to do). The original purpose of Dhrystone was to produce a synthetic program that used "recent statistics" for "real" programs. Weicker's PROGRAM has been widely criticized, but the STATISTICS behind it are probably valid for records and pointers type code. A new implementation of the code which prints results which depend on the correct execution of all the code is certainly needed - Dhrystone II?. A problem with "small" benchmarks which depend on multiple passes over the same data is that typically code and data can run cache contained, which is also very artificial. A new Dhrystone III benchmark which uses the same statistics but has a much larger data area would be more appropriate for testing big machines with lots of cache and memory. It should be noted that one thing that Dhrystone does do "right" is make lots of procedure calls. In my experience, on typical machines that are similar in other respects, it is often the cost of procedure calls, comparisons, and branches that determine the "apparent speed" of a scalar machine used for scalar purposes. The reason Dhrystone looks SO slow on the Cray is very likely due to the relatively much larger cost of procedure calls on the Cray (and the CDC 6600, CDC 7600, Cyber 205, to name a few popular supercomputers). This effect is real and is a valid result of Dhrystone, as long as the compiler doesn't do true global optimization. MIPS computers, and others, have been tending to use Un*x utilities to measure the "general purpose" speed of machines. This makes sense: it should be remembered, however, with respect to Dhrystone, that when it and previous benchmarks were written there was no standard environment available for most processors as there is today. Some (not on this net) still argue about it today. Hugh LaMaster, m/s 233-9, UUCP {seismo,topaz,lll-crg,ucbvax}! NASA Ames Research Center ames!pioneer!lamaster Moffett Field, CA 94035 ARPA lamaster@ames-pioneer.arpa Phone: (415)694-6117 ARPA lamaster@pioneer.arc.nasa.gov "IBM will have it soon" (Disclaimer: "All opinions solely the author's responsibility")