Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!caen!news.cs.indiana.edu!news.nd.edu!mentor.cc.purdue.edu!pop.stat.purdue.edu!hrubin From: hrubin@pop.stat.purdue.edu (Herman Rubin) Newsgroups: comp.arch Subject: Re: Vector vs Cache/Superscalar Message-ID: <11875@mentor.cc.purdue.edu> Date: 4 May 91 12:47:52 GMT References: <1991May4.031835.7979@midway.uchicago.edu Sender: news@mentor.cc.purdue.edu Lines: 42 In article <1991May4.031835.7979@midway.uchicago.edu>, rtp1@quads.uchicago.edu (raymond thomas pierrehumbert) writes: > McAlpin comments that he finds vectorization (even on the Cyber 205) > simpler, more intuitive and more transportable than the optimization > techniques used on cached machines like the RS/6000. > I think this is partly because the vector model of parallelism is > so rigid; optimization for the superscalars involves a bigger bag > of tricks. Still, I have found that there are fewer things they > choke on, and that it is easier to localize optimization in a few > reusable routines. Two case-studies: ................... There are differences in the various types of vector machines; most have highly rigid vectors, but the 205 has quite a bit of flexibility. > (b) Tridiagonal solving. Comes up in lots of codes, and it is > a real vector-breaker. In fact, vector machines choke on all > sorts of recursion, whereas the superscalars just love them. > On the RS/6000, the tridiag code basically vanished, whereas on > the vector Stardent, it was a bottleneck. There are tricky ways of doing this efficiently on vector machines, especially flexible ones. This uses partitioning. > A third example that occurs to me is evaluation of transcendental > functions. Lots of recursion, and pretty efficient on the RISCS. > On a vector machine, you have to keep iterating the vector until > the slowest converging argument is done converging (unless you > do a lot of reshuffling in memory) But this reshuffling in memory on the 205 is dirt cheap! It could be improved by adding more instructions compatible with a CISC vector machine. In fact, this type of management is quite useful in the most efficient type of generation of non-uniform random numbers, acceptance-rejection, and the vector methods are quite competitive on scalar machines with the older scalar versions, and only use more memory, no longer a bottleneck here. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet) {purdue,pur-ee}!l.cc!hrubin(UUCP)