Path: utzoo!utgpu!water!watmath!clyde!bellcore!faline!thumper!ulysses!andante!mit-eddie!bu-cs!purdue!decwrl!labrea!sri-unix!garth!smryan From: smryan@garth.UUCP Newsgroups: comp.arch Subject: Re: Superoptimiser. Message-ID: <841@garth.UUCP> Date: 30 Jun 88 21:43:25 GMT References: <1941@pt.cs.cmu.edu> <3208@ubc-cs.UUCP> <1986@pt.cs.cmu.edu> <754@garth.UUCP> <12171@mimsy.UUCP> <7570@boring.cwi.nl> <834@garth.UUCP> <91odrecXKL1010YEOek@amdahl.uts.amdahl.com> Reply-To: smryan@garth.UUCP (Steven Ryan) Organization: INTERGRAPH (APD) -- Palo Alto, CA Lines: 39 > >Branches tend to be deadly to fast machines. Delay slots or no, it can still > >give the cache/instruction stack indigestion. > >Gosh! You guys aren't thinking big enough. How about multiple >parallel pipelines to compute all the various instruction threads >in parallel and just keep the results of the one that is actually >taken? Actually, it is a 170 (aka 6600) and 205 coding technique (read: trick). Given a conditional expression like if p then a else b if a and b are side-effect and fault-free, define a mask mask(p) = all ones if p true all zeros if p false then the above conditional is encoded as mask(p) and a or not mask(p) and b For example, the following 170 sequence folds Ascii 8/12 control characters into blanks: MX0 0 +0. SX7 40B 8/12 space character code. IX6 X5-X7 X5=original character. IX6 X6+X0 get rid of -0. If control, X5