Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!wuarchive!uwm.edu!linac!att!ucbvax!agate!forney.berkeley.edu!jbuck
From: jbuck@forney.berkeley.edu (Joe Buck)
Newsgroups: comp.arch
Subject: Re: More on Linpack pivoting: isamax and instruction set design
Message-ID: <1991Jun14.010226.11981@agate.berkeley.edu>
Date: 14 Jun 91 01:02:26 GMT
References: <396@validgh.com>
Sender: usenet@agate.berkeley.edu (USENET Administrator)
Reply-To: jbuck@forney.berkeley.edu (Joe Buck)
Organization: University of California, Berkeley
Lines: 46

In article <396@validgh.com>, dgh@validgh.com (David G. Hough on validgh) writes:
|> [ what architectural features speed this up? ]
!>
|>       do 30 i = 2,n
|>          if(abs(dx(i)).le.dmax) go to 30
|>          isamax = i
|>          dmax = abs(dx(i))
|>    30 continue

DSP chips are good for things like this.

The following routine takes 10+3N cycles (60 nsec/cycle for the
original C30):

	ldi	dx,ar0			; ar0 points to the data
	ldi	@n,rc			; vector length
	subi	1,rc			; n-1 to get n loops
	ldf	-1.0,r1			; set max abs value to -1
	rptb	loop			; start zero overhead loop
;...........................
	absf	*ar0++,r0		; r0 = absval of dx[i]
	cmpf	r0,r1			; larger than max?
loop:	ldigt	rc,r2			; if so, mark its position
;...........................
; rc is decremented once each time -- it's n-1 if the first term is
; the max, n-2 if the second, etc.  So n-rc would be the isamax
; output of a Fortran routine.
	ldi	@n,r0
	subi	rc,r0
; now r0 has isamax and r1 has dmax. (extra instructions needed
; to do a C call interface).


Several elements contribute to speed: the zero-overhead loop, the
conditional load (ldigt), the absolute value instruction, and
(sorry, purists) the autoincrement addressing mode.

The RS/6000 already has, in many cases, zero-overhead loops.
I have found, though, that conditional loads are a big win
on heavily pipelined machines where a branch would cause a
large pipeline penalty.


--
Joe Buck
jbuck@galileo.berkeley.edu	 {uunet,ucbvax}!galileo.berkeley.edu!jbuck