Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!yale!think.com!samsung!sdd.hp.com!usc!apple!amdcad!mozart.amd.com!cayman!richard From: richard@cayman.amd.com (Richard Relph) Newsgroups: comp.arch Subject: Re: "Rumours" from BYTE - November 1990 Message-ID: <1990Nov13.160952.13856@mozart.amd.com> Date: 13 Nov 90 16:09:52 GMT References: Sender: usenet@mozart.amd.com (Usenet News) Organization: Advanced Micro Devices, Inc., Austin, Texas Lines: 27 In article pcg@cs.aber.ac.uk (Piercarlo Grandi) writes: >On page 28: "AMD accelerates RISC line with FPU" >------------------------------------------------ >The AMD 29050 has an embedded FPU claimed to have a peak speed of 80 >MFLOPS, with frequencies from 20Mhz to 40Mhz (two flops per cycle?). Yes, that's right, two flops per cycle. In addition to the "simple" floating point operations defined for the 29K family (and implemented in the Am29000/Am29005 via software) that are register = register op register form, we added 4 new "two flop" instructions - FMAC, DMAC, FMSM, and DMSM. These instructions take advantage of 4 new floating point accumulators as an implied 4th operand for operations of the form X = A * B + C. Using FMAC as an example, X is one of the 4 accumulators, A is a general purpose register, B is either a GP register or a constant 1.0 and C is either the accumulator or the constant 0.0. The signs of B and C are fully programmable as well. The FMAC instruction defines all of the operands (and the destination) and issues in 1 cycle. Since the adder is fully pipelined and the multiplier is fully pipelined for single precision multiplies, one can issue a new FMAC every cycle. This is particularly useful for the matrix multiplication the commonly occurs in graphics and other applications. A 4x4 by 1x4 matrix multiplications occurs in just 22 cycles (including waiting for the last FMACs to complete). Also, the accumulators may be either single or double precision without affecting performance of the FMAC instruction. For the FMSM instruction, X, B, and C are all GP registers and A is always accumulator 0. Again, FMSM can be issued every cycle, resulting in 2 flops per cycle.