Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!utgpu!water!watnot!watmath!clyde!rutgers!mit-eddie!uw-beaver!tektronix!cae780!leadsv!eps2!jon
From: jon@eps2.UUCP
Newsgroups: comp.sys.m68k
Subject: Re: Recent Motorola ad seen in Byte
Message-ID: <73@eps2.UUCP>
Date: Fri, 3-Apr-87 17:11:57 EST
Article-I.D.: eps2.73
Posted: Fri Apr  3 17:11:57 1987
Date-Received: Sun, 5-Apr-87 08:51:35 EST
References: <362@sbcs.UUCP> <1466@ncr-sd.SanDiego.NCR.COM> <580@plx.UUCP> <251@winchester.mips.UUCP>
Distribution: comp
Organization: Scumtronics Inc.
Lines: 150
Summary: RISCy business...  (real examples here)

In article <251@winchester.mips.UUCP>, mash@mips.UUCP (John Mashey) writes:
> In article <580@plx.UUCP> ed@plx.UUCP (Ed Chaban) writes:
> > Now the REAL screamer is CLIPPER. The nice thing about CLIPPER is
> > that you can really cut down on all those support chips (building an
> > 8k cache out of discrete components is EXPENSIVE.  
> 
> Sigh.  If you can support that statement with live benchmarks of
> substantial, real programs, please post them.  Even synthetic benchmarks

I was allowed to use an Integraph 32/C at Fairchild to do some benchmarking.  I
wanted to see how the Clipper would perform against a 68020 at "graphics"
operations.  The first program I ran simulates an airbrush.  You start out
with a piece of frame buffer and a "cell" density, which could be a gaussian
distribution airbrush with some stipple.  The pixel becomes (density * color of
airbrush) + ((1 - density) * original pixel)).  This was done 5000 times on
a 32 x 32 airbrush cell.  The times were:

Sun-3/160	cc -O	28.3
Integraph 32/C	cc	30.3	(Greenhills compiler)

A bug in my program which did not affect the timing prevented using the
optimizer (-O, -O2) on the Integraph.  I was suprised the Clipper was
slower.  It would probably match the 68020 with the -O2 option.

Then I ran a program which simulates a blit.  Basically it moves a megabyte
of memory as long words 32 times.  The times were:

Sun-3/160	cc -O	5.8
Sun-3/160 asm version	5.3
Integraph 32/C	cc	7.3
Integraph 32/C	cc -O	7.2
Integraph 32/C	cc -O2	6.8
Integraph 32/C asm ver	6.3

I had thought the burst loading of the cache would make the blit run
at least as fast as the 68020, but I was wrong.  Incidentally, the *p++ = *q++
becomes move.l (a0)+, (a1)+ on the Sun C compiler, but becomes five Clipper
assembly instructions with Green Hills.  Given 60ns machine cycles for the
68020, and 270ns memory cycles, I figure (3 + 2 wait states) * 60ns = 300ns
to read or write a long word from memory.  So the bandwidth of memory is
(1,000,000,000 ns/sec) * (1 long word / 300ns) * (4 bytes / longword) =
13,333,333 bytes/sec.  The Sun  was reading and writing 64M bytes in 5.3
seconds, so it was moving bytes up near the bandwidth of memory, which is
kind of nice.

Actually, we have hardware to do blits and airbrushes, and it runs (I would
guess) at least 10x faster than the 68020 or Clipper.  I should qualify that
by saying the blits are 10x faster if the CPU blits include multiple sources,
with look-up and ALU functions.

My conclusion was that the Clipper wouldn't perform well in our system as
a graphics processor.  Heck, it was actually slower than the 16.67Mhz
68020.  When the AMD FAE was out here, he said when the 29000
runs the same algorithms that are implemented in the QPDM (9560), it runs
them twice as fast.  That sounds pretty good, software flexibility to write
a stippled, brick-pattern airbrush, and run it at hardware speeds.

I read Fairchild's Performance White Paper.  They claim 3x the performance
of a 16.67Mhz 68020.  Maybe on dhrystone, but not on my stuff.  They claim
8064 dhrystones, to the Sun-3/160 at 2745.  I don't know about you people,
but my March 15th dhrystone says the Integraph is 5275 and the Sun is 3246.
Their claim is that 8064 is with a new compiler.  That's pretty impressive,
60% gain from the compiler.  I guess the old one had some shortcomings, huh?
Can anyone with an Integraph verify the 8064 number?  I think it would be
fairer if Fairchild compared the Integraph to a Sun-3/260, not a 3/160
anyway.

> 8K cache: expensive? we spend about $150 for 24K of cache.  Maybe that's
> more expensive than a pair of 300K+ transistor Clipper CAMMUs, but I doubt it.

I wish I could add a cache to a 68020 right now with just one chip, the
way Intel and AT&T can.  I'll have to wait for the 68030 because puny little
companies like us (don't let the DuPont name fool you, we're a subsidiary)
can't afford to design and build them.


Jonathan Hue	DuPont Design Technologies/Via Visuals		leadsv!eps2!jon
*Disclaimer: You're right, I don't know what I'm talking about*

Here are the programs I used:


unsigned char fb[0x10000];
unsigned short cell[1024];

/*
 * airbrush with multiply tables
 */
wrt_airb(bP, mP, cP, wx, wy, clr)
register unsigned char *bP, *mP;
register short *cP, wx;
short wy;
register unsigned char clr;
{
	register unsigned char pixel;
	register short d0, d1;
	register int j;

	while (wy--)  {
		j = wx;
		while (j--)  {
			d1 = *cP++;
			d0 = d1 + *bP;
			pixel = (d0 - mP[d0]) & 0xff;
			d1 += clr;
			pixel += mP[d1] & 0xff;
			*bP++ = pixel & 0xfe;
		}
		cP += (32 - wx);
		bP += (0x800 - wx);
	}
}


main()
{
	register int i;

	for (i = 0; i < 5000; i++)
		wrt_airb(fb, fb, cell, (short) 32, (short) 32, (char) 0);
}


long buffer[0x40001];

blit()
{
	register long *p, *q, i;

	p = buffer;
	q = p + 1;
	i = 16384;
	while (i--)  {
		*p++ = *q++; *p++ = *q++; *p++ = *q++; *p++ = *q++;
		*p++ = *q++; *p++ = *q++; *p++ = *q++; *p++ = *q++;
		*p++ = *q++; *p++ = *q++; *p++ = *q++; *p++ = *q++;
		*p++ = *q++; *p++ = *q++; *p++ = *q++; *p++ = *q++;
	}
}

main()
{
	register int i;

	for (i = 0; i < 32; i++)
		blit();
}