Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!rice!ariel.rice.edu!preston From: preston@ariel.rice.edu (Preston Briggs) Newsgroups: comp.arch Subject: Re: cache pre-load/no-load instructions Message-ID: <1991Mar22.160421.27462@rice.edu> Date: 22 Mar 91 16:04:21 GMT References: <765@ajpo.sei.cmu.edu> <1991Mar21.161044.2898@rice.edu> <27671@neptune.inf.ethz.ch> Sender: news@rice.edu (News) Organization: Rice University, Houston Lines: 43 I wrote: >>The RS/6000 includes 2 interesting possibilities. >>An instruction that zeroes a line in the data cache (without >>fetching it). May be used like (2 above); additionally handy for zeroing >>big chunks of memory. They also include an "invalidate line" >>instruction which says: "don't bother writing this one back to memory." and brandis@inf.ethz.ch (Marc Brandis) writes: >Unfortunately, IBM made these instructions privileged. They had some good >reasons to do it, as the instructions ignore lock and protection bits. I do >not know the reasons why they could not make them check the bits, however. > >I am not sure whether having these instructions in user mode would be a great >advantage. DCLSZ (data cache line set zero) can be used to initialize large >chunks of memory, of course. The other obvious target for the DCLSZ and CLI >(cache line invalidate) instructions is to control the allocation and >deallocation of procedure frames on the stack so that no memory references >are generated for newly allocated stack space and that no deallocated stack >space will be written back to memory. Implementing Fortran, I would have used them on large arrays. When you're doing one of the BLAS routines and the destination is merely overwritten, then we can save a lot of cache-misses by not fetching it. Similarly, when we're done with a temporary workspace, we may simply invalidate it. The difficulty is alignment. It seems difficult to ensure that nothing extraneous is accidentally zeroed when using long cache lines. Brandis also make the point that the compiler would have to be parameterized to account properly for cache line length. True! Generally, compilers are written to the architecture, not the implementation; cache is usually part of the implementation. However, instruction schedulers are bending this idea already. Further, various cache blocking techniques (often used at the source level) bend it further. You have to work hard for performance. Summarizing, I can't argue that the RS/6000's instructions are practical as they stand, and I don't have compiler techniques to use them yet. However, they (along with HP's cache instructions) are interesting ideas and probably worth some study. Preston Briggs