Path: utzoo!attcan!uunet!husc6!mailrus!iuvax!pur-ee!a.cs.uiuc.edu!uxc.cso.uiuc.edu!urbsdc!aglew From: aglew@urbsdc.Urbana.Gould.COM Newsgroups: comp.arch Subject: Re: getting rid of branches Message-ID: <28200173@urbsdc> Date: 1 Jul 88 13:59:00 GMT References: <12258@mimsy.UUCP> Lines: 29 Nf-ID: #R:mimsy.UUCP:12258:urbsdc:28200173:000:1397 Nf-From: urbsdc.Urbana.Gould.COM!aglew Jul 1 08:59:00 1988 >Actually, this sort of idea is contained in some research and thesis >work that is (was?) going on here at Maryland. How do you move old >code (`dusty decks') onto a parallel processor? One way is to slice up >the program into independent pieces that can be combined again later. This comes perilosly close to something I'm trying. Davidson at the UIll, now UMich, proposed a partitioned access/execute architecture a while back, where you basically have two processors, one for address and memory computations, one for (mainly floating point?) calculation. They run independently, with FIFOs between them for passage of variables back and forth, condition codes, etc. They basically run two versions of exactly the same program, one with all the FP taken out, and the other with all the memory references replaced by "get the next value from the memory unit". Sounds a bit like Wulf's WM, with the extension that, instead of executing the same program and treating instructions for the other unit as NOPs, you partition the program. I'm trying to take this a bit farther - instead of partitioning by _type_ of data, I want to see there is enough parallelism within the same data type, when there are many registers available. Ie. have different functional units for all operations on registers R0-R15 and R16-R31, with FIFOs for operations like R0 = R16 + R32. Anyone else doing similar?