Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!uunet!husc6!mit-eddie!ll-xn!ames!necntc!linus!alliant!muller From: muller@alliant.Alliant.COM (Jim Muller) Newsgroups: comp.arch Subject: Re: Macho flops versus Megaflops (was Re: ETA10-P performance) Message-ID: <860@alliant.Alliant.COM> Date: Tue, 17-Nov-87 14:14:39 EST Article-I.D.: alliant.860 Posted: Tue Nov 17 14:14:39 1987 Date-Received: Sat, 21-Nov-87 03:36:24 EST References: <676@zycad.UUCP> <3322@ames.arpa> Reply-To: muller@alliant.UUCP (Jim Muller) Organization: Alliant Computer Systems, Littleton, MA Lines: 59 In <3322@ames.arpa> fouts@orville.nas.nasa.gov.UUCP (Marty Fouts) writes: >...most machines have a different peak number and "average" number. >The advertised peak performance number is frequently refered to around here >as the "guarenteed not to exceed this speed" number... He then gave a good explanation of how peak numbers are calculated, and why they really are "peak" instead of "average sustainable" speed. His reasons were quite accurate, and can be summarized into four areas: >First of all, your application isn't entirely vector adds and multiplies... >Secondly, there is probably some architectural gotcha that will keep the > machine (from) getting peak performance...relating to vector length, > number and type of functional units, and memory reference patterns. >Thirdly, there is the quality of the compiler technology. >Finally, I/O can do you in. There is one more item to the story, though. All of these factors influence the speed of any given code, but the business of peak vs. average speed is presumably beyond the features of "typical" applications. In other words, such test codes should be written so as *not* to trip over these things. The extra item is something you cannot work around, i.e. the "ramp-up" time of vector instructions. Typically, vector instructions take N cycles to load up, followed by M cycles with output. It is the rate of the M outputs that is used for "peak" speeds. The average sustained speed, though, is reduced by a factor of N / (N + M). If the ramp-up requires half as many cycles as the vector lenght, then the sustainable rate will be only 2/3 of the peak rate, EVEN IF THE CODE IS PERFECTLY MATCHED TO THE OTHER ARCHITECTURAL FEATURES OF THE MACHINE! It has nothing to do with the four "real world" factors that Marty explained so well. "So why not list sustainable rates, instead? Or give the ramp-up times too?" you ask. Simply because it isn't that simple. The peak rates quoted may be for the fairly busy triadic vector operations. Simpler vector operations may require fewer ramp-up cycles, but still output one datum per cycle. Yet the nominal flop-rate (both peak and sustainable) is lower because that operation is doing less work (e.g. an add is only half as many nominal operations as a multiply followed by an add). Inotherwords, there is no single answer. One thing that machine designers (should) try to do is reduce the ramp-up time for vector instructions, since this will result in a real-time speedup of the vector portions of any code. However, while improving both the theoretical sustainable rate and the real throughput rate, it has no impact on the peak rate. Thus, the true speed of a machine is obscured before you ever get into the question of "real world" applications. Highly tuned, avoid-all-the-architectural-pitfalls codes for the Alliant FX/8 have managed to reach sustained output rates near the *sustainable* rate as described here. I have no doubt that other super- and mini-super-computer builders have done this too. However, no code will ever go faster than the sustainable rate, and never even reach the peak rate, unless you measure output rate during the body of a single vector instruction. BTW, these highly tuned codes are usually worthless except as academic studies, since real-life applications are often dominated by the other architectural weaknesses, i.e. you start from the sustainable rate and work down! ----------------------------------------------------------------------------- My employer did not sanction this posting, nor did they require or request me to make this disclaimer. Thanks for listening. - Jim