Path: utzoo!attcan!uunet!tut.cis.ohio-state.edu!zaphod.mps.ohio-state.edu!swrinde!emory!hubcap!mccalpin From: mccalpin@vax1.udel.edu (John D Mccalpin) Newsgroups: comp.parallel Subject: Re: Acceptable efficiency factors Message-ID: <9545@hubcap.clemson.edu> Date: 3 Jul 90 13:08:33 GMT Sender: fpst@hubcap.clemson.edu Lines: 29 Approved: parallel@hubcap.clemson.edu In article <9508@hubcap.clemson.edu> xxremak@csduts1.lerc.nasa.gov (David A. Remaklus) writes: >In a recent conversation with some colleagues of mine at the Ames NAS >facility concerning parallel processing, they mentioned their experiences >porting a code to the Intel i860 hybercube located there (128 nodes, >7.5 gigaFLOPS peek). On this particular code they were able to >achieve about 300 MFLOPS for an efficiency factor of about 2.5%. This >low efficiency factor didn't seem to bother them but it sure bothered >me. The question of efficiency is complicated in this case by the choice of the i860 as the cpu. The peak performance quoted corresponds to about 60 MFLOPS/cpu, which may not be attainable even for optimally coded assembly language routines. Preston Briggs at Rice University has spent some time working on this processor and in a real live piece of hardware was unable to obtain greater than about 33 MFLOPS for a hand-coded 64-bit matrix-multiply kernel. Code compiled from FORTRAN using existing compiler technology typically produced performance in the 2-5 MFLOPS range. The 300 MFLOPS observed performance is about 2.3 MFLOPS/cpu, and may indicate very good performance, all things considered. So a more reasonable estimate of efficiency for this case is to look at the parallel speedup. I would be surprised if one cpu gave better than 5 MFLOPS performance, so the "efficiency" in this case would be close to 50% = (300 MFLOPS)/(128 cpu*5MFLOPS/cpu). -- John D. McCalpin mccalpin@vax1.udel.edu Assistant Professor mccalpin@delocn.udel.edu College of Marine Studies, U. Del. mccalpin@scri1.scri.fsu.edu