Path: utzoo!utgpu!watmath!att!tut.cis.ohio-state.edu!gem.mps.ohio-state.edu!ginosko!husc6!think!barmar From: barmar@think.COM (Barry Margolin) Newsgroups: comp.arch Subject: MIMD on Connection Machine Message-ID: <26356@news.Think.COM> Date: 9 Aug 89 19:40:58 GMT References: <107900005@iuvax> <8953@june.cs.washington.edu> <2238@jhunix.HCF.JHU.EDU> <5796@pt.cs.cmu.edu> Sender: news@Think.COM Reply-To: barmar@kulla.UUCP (Barry Margolin) Organization: Thinking Machines Corporation, Cambridge MA, USA Lines: 49 In article <5796@pt.cs.cmu.edu> jps@cat.cmu.edu (James Salsman) writes: >In article <2238@jhunix.HCF.JHU.EDU> ins_atge@jhunix.UUCP (Thomas G Edwards) writes: >> And just because every processor gets the same instruction feed, one must not >> think that every processor is "doing the same thing." Each CM processor >> can hold an index to an array located in that processor, >> so with the right software, the CM could become a MIMD machine. > >Please do not attempt this. I have tried for months to >optimize a MIMD-emulator on the CM-2 with very disappointing >results. It might be a fun class project for a bunch of >novice hackers learning parallelism. Right. I assume Thomas was thinking of keeping a macro-instruction stream in the CM processor memory and stepping through that using the indirect addressing feature. You could then implement a parallel simulator to interpret the macro-instruction stream. However, only a small part of that interpreter can actually run fully parallel. Once the macro-instruction is decoded and it's time to actually execue it, you essentially have a big CASE (or, in *Lisp syntax, *COND) statement. Each branch will be executed in parallel by all the processors that pass its associated test, but you still have to run through all the tests sequentially. The code might look something like: (*cond ((load-op-p opcode) (do-load operands)) ((store-op-p opcode) (do-store operands)) ((add-op-p opcode) (do-add operands)) ...) This will first do all the simultaneous LOAD operations in parallel, then all the simultaneous STOREs, then all the simultaneous ADDs, etc. Therefore, the speed of this MIMD simulator will be O(n), where n is the number of different opcodes. I can think of some ways to adjust the above tests dynamically to change it to O(m), where m is the average number of different opcodes being executed simultaneously, but that's still not very good. It'll never be a very good MIMD machine. The individual processors aren't very powerful. It might be reasonable for research on programming techniques for use on kiloprocessor MIMD machines until the real things come along (it's probably better than trying to simulate them on uniprocessors, since the performance there is O(p), where p is tne number of processors being simulated). Barry Margolin Thinking Machines Corp. barmar@think.com {uunet,harvard}!think!barmar