Path: utzoo!attcan!uunet!husc6!mailrus!iuvax!pur-ee!a.cs.uiuc.edu!uxc.cso.uiuc.edu!urbsdc!aglew
From: aglew@urbsdc.Urbana.Gould.COM
Newsgroups: comp.arch
Subject: Re: getting rid of branches
Message-ID: <28200173@urbsdc>
Date: 1 Jul 88 13:59:00 GMT
References: <12258@mimsy.UUCP>
Lines: 29
Nf-ID: #R:mimsy.UUCP:12258:urbsdc:28200173:000:1397
Nf-From: urbsdc.Urbana.Gould.COM!aglew    Jul  1 08:59:00 1988


>Actually, this sort of idea is contained in some research and thesis
>work that is (was?) going on here at Maryland.  How do you move old
>code (`dusty decks') onto a parallel processor?  One way is to slice up
>the program into independent pieces that can be combined again later.

This comes perilosly close to something I'm trying.
Davidson at the UIll, now UMich, proposed a partitioned
access/execute architecture a while back, where you basically
have two processors, one for address and memory computations,
one for (mainly floating point?) calculation.
They run independently, with FIFOs between them for passage of
variables back and forth, condition codes, etc. They basically
run two versions of exactly the same program, one with all the
FP taken out, and the other with all the memory references
replaced by "get the next value from the memory unit".

Sounds a bit like Wulf's WM, with the extension that,
instead of executing the same program and treating instructions
for the other unit as NOPs, you partition the program.

I'm trying to take this a bit farther - instead of partitioning
by _type_ of data, I want to see there is enough parallelism
within the same data type, when there are many registers available.
Ie. have different functional units for all operations on 
registers R0-R15 and R16-R31, with FIFOs for operations like
R0 = R16 + R32.

Anyone else doing similar?