Path: utzoo!utgpu!news-server.csri.toronto.edu!bonnie.concordia.ca!uunet!wuarchive!psuvax1!rutgers!rochester!pt.cs.cmu.edu!gandalf.cs.cmu.edu!lindsay From: lindsay@gandalf.cs.cmu.edu (Donald Lindsay) Newsgroups: comp.arch Subject: Re: Register Count Summary: Counterexamples to "The Truth about "optimizers", for the millionth time" and "a treatise on value dataflow patterns". Message-ID: <11566@pt.cs.cmu.edu> Date: 12 Jan 91 20:59:26 GMT References: Organization: Carnegie Mellon Robotics Institute Lines: 46 In article pcg@cs.aber.ac.uk (Piercarlo Grandi) writes: >Loop unrolling only wins if you >have some degree of multiple functional units, otherwise you only buy >reduced overhead, which is not much, especially if the implementation >has some degree of pipelining. Unrolling (to reduce overhead) is the basic trick of hand-coded memcpy routines, on such "multiple functional unit" classics as the 8080 and 68010. Pipelining makes unrolling more useful, not less. >It is an old controversy, but let me repeat here: 90% of what passes for >"optimizing" compilers are compilers that optimize for *space*, not >time, ... The optimizing compilers that I worked on, were trying for speed. We tuned the things by endlessly recompiling and running our (ever- expanding) code-quality suite. We didn't bother to measure how small the generated code was: we just timed it. >Based on my armchair evidence I >believe that most general purpose applications data flows can be >modelled best with four multiple stacks (quite shallow, by the way, say >four deep), because such dataflows have the shape of a tree with a >relatively low branching factor, and relatively small number of levels >in each independent branch. I have before me a dataflow diagram that isn't tree-like. Yes, it's a real-world Fortran subroutine. Further, you are referring to floating point calculation. In many programs, i := i + 1 is the most common calculation: addressing and control are dominant. If you want to convince me about your "most" and "best", you are going to need numbers. It is just as easy to believe that the modal stack depth would be 1 (one), hence, the multiple stacks would be that many (expensive) registers. I could go deeper into this: I could demand to know how your design does (say) context switches. But, on the whole, stack architectures are dead, and dealing with a proposed one doesn't seem like a good use of my time. -- Don D.C.Lindsay .. temporarily at Carnegie Mellon Robotics