Path: utzoo!utgpu!watmath!clyde!att!osu-cis!killer!ames!elroy!cit-vax!kahuna!newton From: newton@kahuna.UUCP (Mike Newton) Newsgroups: comp.arch Subject: Re: Chaining on IBM 3090 VF Keywords: chaining, vector processor Message-ID: <168@kahuna.UUCP> Date: 4 Jan 89 10:45:35 GMT References: <3950@pt.cs.cmu.edu> Reply-To: newton@kahuna.UUCP (Mike Newton) Organization: CSO Observatory, Hilo, Hawaii Lines: 40 In article <3950@pt.cs.cmu.edu> yk@a.nl.cs.cmu.edu (Yasusi Kanada) writes: >I read an article of IBM 3090 in 88-12 issue of Transaction of Information >Processing (written in Japanese) recently. In this article, the author >(in IBM Tokyo Research Center) writes that the following instruction sequence >is executed in CHAINED manner, so the result is generated every cycle. > > VL VR0,A(R1) > VA VR0,B(R2) > VST VR0,C(R3) > >Is that true? Thanks in advance. > >-Yasusi Kanada Once the pipeline is loaded this will give you a result every cycle. To overcome multiply bottlenecks in fp multiply, i believe they alternate multipliers between a bank of 2 or 3 multipliers. I stronly urge you to find an old copy of IBM Journal of R & D from last year (Feb?) if you have access to an 3090 (w/ or w/o VF). I'd give you the exact date and more precise info above, but my copy is about 3000 miles from here. Some useful facts for highspeed code generation (my speciality): 'LA' instructions are effectively executed by the instruction fetcher and so are usually 0 clock cycles. Also: avoid overlapping args for SS instructions and self-modifying code like the plague. To a good first order aproximation program execution time = clock cycle time * number of instructions executed (ie: it's basically a risc :-) !! ) - mike newton@csvax.caltech.edu Caltech Submillimeter Observatory (which is forwarded to) POB 4339 / Hilo HI 96720 cit-vax!kahuna!newton 808 935 1909 -- newton@csvax.caltech.edu Caltech Submillimeter Observatory (which is forwarded to) POB 4339 / Hilo HI 96720 cit-vax!kahuna!newton 808 935 1909