Path: utzoo!utgpu!watmath!clyde!att!osu-cis!killer!ames!elroy!cit-vax!kahuna!newton
From: newton@kahuna.UUCP (Mike Newton)
Newsgroups: comp.arch
Subject: Re: Chaining on IBM 3090 VF
Keywords: chaining, vector processor
Message-ID: <168@kahuna.UUCP>
Date: 4 Jan 89 10:45:35 GMT
References: <3950@pt.cs.cmu.edu>
Reply-To: newton@kahuna.UUCP (Mike Newton)
Organization: CSO Observatory, Hilo, Hawaii
Lines: 40

In article <3950@pt.cs.cmu.edu> yk@a.nl.cs.cmu.edu (Yasusi Kanada) writes:
>I read an article of IBM 3090 in 88-12 issue of Transaction of Information
>Processing (written in Japanese) recently.  In this article, the author
>(in IBM Tokyo Research Center) writes that the following instruction sequence
>is executed in CHAINED manner, so the result is generated every cycle.
>
>	VL	VR0,A(R1)
>	VA	VR0,B(R2)
>	VST	VR0,C(R3)
>
>Is that true?  Thanks in advance.
>
>-Yasusi Kanada


Once the pipeline is loaded this will give you a result every cycle.  
To overcome multiply bottlenecks in fp multiply, i believe they alternate
multipliers between a bank of 2 or 3 multipliers.

I stronly urge you to find an old copy of IBM Journal of R & D from
last year (Feb?) if you have access to an 3090 (w/ or w/o VF).  I'd
give you the exact date and more precise info above, but my copy is
about 3000 miles from here.

Some useful facts for highspeed code generation (my speciality): 'LA'
instructions are effectively executed by the instruction fetcher and
so are usually 0 clock cycles.  Also: avoid overlapping args for SS
instructions and self-modifying code like the plague.  To a good
first order aproximation program execution time = clock cycle time * 
number of instructions executed (ie: it's basically a risc :-) !! )

- mike

newton@csvax.caltech.edu		Caltech Submillimeter Observatory
(which is forwarded to)			POB 4339 / Hilo HI 96720
 cit-vax!kahuna!newton			808 935 1909
-- 
newton@csvax.caltech.edu		Caltech Submillimeter Observatory
(which is forwarded to)			POB 4339 / Hilo HI 96720
 cit-vax!kahuna!newton			808 935 1909