Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!julius.cs.uiuc.edu!apple!amdcad!mozart.amd.com!cayman!richard
From: richard@cayman.amd.com (Richard Relph)
Newsgroups: comp.arch
Subject: Re: "Rumours" from BYTE - November 1990
Message-ID: <1990Nov15.165330.24387@mozart.amd.com>
Date: 15 Nov 90 16:53:30 GMT
References: <PCG.90Nov12181003@odin.cs.aber.ac.uk> <1990Nov13.160952.13856@mozart.amd.com> <6619@ethz.UUCP>
Sender: usenet@mozart.amd.com (Usenet News)
Organization: Advanced Micro Devices, Inc., Austin, Texas
Lines: 47

In article <6619@ethz.UUCP> ruehl@ethz.UUCP (Roland Ruehl) writes:
>In article <1990Nov13.160952.13856@mozart.amd.com>, richard@cayman.amd.com (Richard Relph) writes:
>> In article <PCG.90Nov12181003@odin.cs.aber.ac.uk> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
>> >On page 28: "AMD accelerates RISC line with FPU"
>> >------------------------------------------------
>> >The AMD 29050 has an embedded FPU claimed to have a peak speed of 80
>> >MFLOPS, with frequencies from 20Mhz to 40Mhz (two flops per cycle?).
>> Yes, that's right, two flops per cycle. .........
>
>  Even more interesting: when can one get a solid 29050 C compiler
>  exploiting all these goodies ?
Both the MetaWare compiler and a GCC compiler have the capability to
generate the instructions that execute 2 flops. Both compilers are
"in testing" and are expected to be generally available this year.
Here's some sample code produced by one of the compilers:

float t1 (float *res1, float *v1, float *v2, float scale, float offset)
{
  int i;
  float accum = 0.0;
  float accum2 = 0.0;

  for (i = 0; i < 100; i++) {
      accum += v1[i] * v2[i];
      accum2 += v1[i] * (- v2[i]);
      }
  *res1 = accum2 * scale + offset;
  return accum;
}
_t1:	sll gr119,lr5,0
	const gr116,0
	mtacc gr116,1,3
	mtacc gr116,1,0
	const gr116,396
	add gr118,lr4,gr116
L5:	load 0,0,gr116,lr3
	load 0,0,gr117,lr4
	fmac 0,3,gr116,gr117
	fmac 1,0,gr116,gr117
	add lr4,lr4,4
	cple gr116,lr4,gr118
	jmpt gr116,L5
	add lr3,lr3,4
	fmsm gr116,gr119,lr6
	mfacc gr96,1,3
	jmpi lr0
	store 0,0,gr116,lr2