Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watnot!watmath!clyde!rutgers!mit-eddie!uw-beaver!tektronix!cae780!leadsv!eps2!jon From: jon@eps2.UUCP Newsgroups: comp.sys.m68k Subject: Re: Recent Motorola ad seen in Byte Message-ID: <73@eps2.UUCP> Date: Fri, 3-Apr-87 17:11:57 EST Article-I.D.: eps2.73 Posted: Fri Apr 3 17:11:57 1987 Date-Received: Sun, 5-Apr-87 08:51:35 EST References: <362@sbcs.UUCP> <1466@ncr-sd.SanDiego.NCR.COM> <580@plx.UUCP> <251@winchester.mips.UUCP> Distribution: comp Organization: Scumtronics Inc. Lines: 150 Summary: RISCy business... (real examples here) In article <251@winchester.mips.UUCP>, mash@mips.UUCP (John Mashey) writes: > In article <580@plx.UUCP> ed@plx.UUCP (Ed Chaban) writes: > > Now the REAL screamer is CLIPPER. The nice thing about CLIPPER is > > that you can really cut down on all those support chips (building an > > 8k cache out of discrete components is EXPENSIVE. > > Sigh. If you can support that statement with live benchmarks of > substantial, real programs, please post them. Even synthetic benchmarks I was allowed to use an Integraph 32/C at Fairchild to do some benchmarking. I wanted to see how the Clipper would perform against a 68020 at "graphics" operations. The first program I ran simulates an airbrush. You start out with a piece of frame buffer and a "cell" density, which could be a gaussian distribution airbrush with some stipple. The pixel becomes (density * color of airbrush) + ((1 - density) * original pixel)). This was done 5000 times on a 32 x 32 airbrush cell. The times were: Sun-3/160 cc -O 28.3 Integraph 32/C cc 30.3 (Greenhills compiler) A bug in my program which did not affect the timing prevented using the optimizer (-O, -O2) on the Integraph. I was suprised the Clipper was slower. It would probably match the 68020 with the -O2 option. Then I ran a program which simulates a blit. Basically it moves a megabyte of memory as long words 32 times. The times were: Sun-3/160 cc -O 5.8 Sun-3/160 asm version 5.3 Integraph 32/C cc 7.3 Integraph 32/C cc -O 7.2 Integraph 32/C cc -O2 6.8 Integraph 32/C asm ver 6.3 I had thought the burst loading of the cache would make the blit run at least as fast as the 68020, but I was wrong. Incidentally, the *p++ = *q++ becomes move.l (a0)+, (a1)+ on the Sun C compiler, but becomes five Clipper assembly instructions with Green Hills. Given 60ns machine cycles for the 68020, and 270ns memory cycles, I figure (3 + 2 wait states) * 60ns = 300ns to read or write a long word from memory. So the bandwidth of memory is (1,000,000,000 ns/sec) * (1 long word / 300ns) * (4 bytes / longword) = 13,333,333 bytes/sec. The Sun was reading and writing 64M bytes in 5.3 seconds, so it was moving bytes up near the bandwidth of memory, which is kind of nice. Actually, we have hardware to do blits and airbrushes, and it runs (I would guess) at least 10x faster than the 68020 or Clipper. I should qualify that by saying the blits are 10x faster if the CPU blits include multiple sources, with look-up and ALU functions. My conclusion was that the Clipper wouldn't perform well in our system as a graphics processor. Heck, it was actually slower than the 16.67Mhz 68020. When the AMD FAE was out here, he said when the 29000 runs the same algorithms that are implemented in the QPDM (9560), it runs them twice as fast. That sounds pretty good, software flexibility to write a stippled, brick-pattern airbrush, and run it at hardware speeds. I read Fairchild's Performance White Paper. They claim 3x the performance of a 16.67Mhz 68020. Maybe on dhrystone, but not on my stuff. They claim 8064 dhrystones, to the Sun-3/160 at 2745. I don't know about you people, but my March 15th dhrystone says the Integraph is 5275 and the Sun is 3246. Their claim is that 8064 is with a new compiler. That's pretty impressive, 60% gain from the compiler. I guess the old one had some shortcomings, huh? Can anyone with an Integraph verify the 8064 number? I think it would be fairer if Fairchild compared the Integraph to a Sun-3/260, not a 3/160 anyway. > 8K cache: expensive? we spend about $150 for 24K of cache. Maybe that's > more expensive than a pair of 300K+ transistor Clipper CAMMUs, but I doubt it. I wish I could add a cache to a 68020 right now with just one chip, the way Intel and AT&T can. I'll have to wait for the 68030 because puny little companies like us (don't let the DuPont name fool you, we're a subsidiary) can't afford to design and build them. Jonathan Hue DuPont Design Technologies/Via Visuals leadsv!eps2!jon *Disclaimer: You're right, I don't know what I'm talking about* Here are the programs I used: unsigned char fb[0x10000]; unsigned short cell[1024]; /* * airbrush with multiply tables */ wrt_airb(bP, mP, cP, wx, wy, clr) register unsigned char *bP, *mP; register short *cP, wx; short wy; register unsigned char clr; { register unsigned char pixel; register short d0, d1; register int j; while (wy--) { j = wx; while (j--) { d1 = *cP++; d0 = d1 + *bP; pixel = (d0 - mP[d0]) & 0xff; d1 += clr; pixel += mP[d1] & 0xff; *bP++ = pixel & 0xfe; } cP += (32 - wx); bP += (0x800 - wx); } } main() { register int i; for (i = 0; i < 5000; i++) wrt_airb(fb, fb, cell, (short) 32, (short) 32, (char) 0); } long buffer[0x40001]; blit() { register long *p, *q, i; p = buffer; q = p + 1; i = 16384; while (i--) { *p++ = *q++; *p++ = *q++; *p++ = *q++; *p++ = *q++; *p++ = *q++; *p++ = *q++; *p++ = *q++; *p++ = *q++; *p++ = *q++; *p++ = *q++; *p++ = *q++; *p++ = *q++; *p++ = *q++; *p++ = *q++; *p++ = *q++; *p++ = *q++; } } main() { register int i; for (i = 0; i < 32; i++) blit(); }