Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!ncar!ncar.ucar.EDU!ftower From: ftower@ncar.ucar.EDU (Francis Tower) Newsgroups: comp.lang.fortran Subject: Re: vectorization question reposed Message-ID: <10851@ncar.ucar.edu> Date: 31 Mar 91 14:49:44 GMT References: <1991Mar30.142903.5225@ariel.unm.edu> Sender: news@ncar.ucar.edu Reply-To: ftower@ncar.ucar.EDU (Francis Tower) Organization: Climate and Global Dynamics Division, NCAR Lines: 58 Dear John, Most of the following I've taken from the Cray manual TR-OPT, CF77 & Standard C Features and Optimization. The rest, I just made up. 1. Each CPU on the Y-MP has 4 each buffers. Each can hold 128 instruction 'parcels'. However there is no data or instruction cache as others systems might use to speed up serial processing. 2. I don't believe 'fpp' will attempt to reorder your indices. Even if it did try, it wouldn't be able to do much because of the use of indirect indices. For irregular grids (many Finite Element problems) the indirect way is frequently used. The indicies are fixed, we know that, but the pre-compiler doesn't. 'fpp' or 'cft77' must assume the worst case. If there are bank conflicts, you eat them. 3. Each Y-MP CPU has 4 ports to memory so it's possible for each CPU to be accessing each of the Y-MP's 4 memory sections. Each memory section has 8 subsections. Depending on the amount of memory, each section is broken into banks. The minimum configuration (Y-MP 2/116 has 64 banks. The 4/132 has 128 banks, and the 8/432 upwards have 256 banks. Each bank has a 5 clock period cycle time, and all the banks are interleaved. The worst case is when the stride =0 or the number of banks you have. On a 256 bank machine: STRIDE Relative Performance 0 (or 256) 1 / 5 192 4 / 5 128 2 / 5 64 4 / 5 every-thing else 1 / 1 Sorry, I gotta run and play B-Ball. I'll Be back Francis G Tower Software QA NCAR/CGD/ICS << Middle-aged Mutant Ninja Modelers >> "Don't be misled by truth. Science is fact!"