Path: utzoo!attcan!uunet!husc6!bbn!rochester!pt.cs.cmu.edu!k.gp.cs.cmu.edu!lindsay
From: lindsay@k.gp.cs.cmu.edu (Donald Lindsay)
Newsgroups: comp.arch
Subject: m88000 benchmarks
Keywords: FFT, m88000, benchmark, VLSI System Design
Message-ID: <1941@pt.cs.cmu.edu>
Date: 14 Jun 88 16:18:05 GMT
Sender: netnews@pt.cs.cmu.edu
Organization: Carnegie-Mellon University, CS/RI
Lines: 20

We don't have much benchmarking info yet about the Motorola 88000.
However, the May issue of "VLSI Systems Design" contains a pipeline timing
chart for an FFT inner loop.

The (compiler generated) code does 4 loads, 4 stores, 10 single precision
float calculations, and 4 other things, in 27 clocks. At 20 MHz, that's
7.4 MFLOPS.

A 16KB CMMU can hold 4K floats, but they all have to be faulted in.  A
recent post suggested counting 10 clocks per 16 byte fault. That's 2.5
clocks per float, but since a large FFT visits each data point several
times (say, 11) we can amortize the startup cost to about 1 clock per
inner loop.

So, "about 7 MFLOPS on an FFT benchmark" seems fair.
-- 
Don		lindsay@k.gp.cs.cmu.edu    CMU Computer Science

"Imitation is not the sincerest form of flattery. Payments are."
- a British artist who died penniless before copyright law.