Path: utzoo!attcan!uunet!cs.utexas.edu!tut.cis.ohio-state.edu!ukma!gatech!hubcap!Marshall From: cline@sunshine.ece.clarkson.edu (Marshall Cline) Newsgroups: comp.parallel Subject: "Molecule: Language Construct for Development Parallel Programs" Message-ID: <5593@hubcap.clemson.edu> Date: 24 May 89 20:41:38 GMT Sender: fpst@hubcap.clemson.edu Lines: 182 Approved: parallel@hubcap.clemson.edu In article <5583@hubcap.clemson.edu> budd@mist.cs.orst.edu (Tim Budd) writes: >...This allows us to generate >fine grain SIMD (connection-machine style) code by introducing parallelism >one way, and coarse grain, MIMD (sequent-style) code by introducing >parallelism in another way.... not >something the programmer needs to deal with.... >--tim budd, budd@cs.orst.edu Did ya'all see the article in May's IEEE Transactions Software Engineering? "Molecule: A Language Construct for Development of Parallel Programs", by Z. Xu & K. Hwang, pages 587-599. Personally, I'm interested in the concept of allowing _one_ parallel program to compile on _many_ hardware platforms regardless of the hardware's underlying mechanisms. I'd like to discuss this issue if others are also interested. __________MOTIVATION:_HARDWARE_INDEPENDENCE__________________________ Up to now, we've had to re-write code to port it from one vendor's parallelization mechanisms to another's. Issues such as: * SIMD vs MIMD * fine- vs coarse-grained * loosely- vs tightly-coupled * mechanisms for mutual exclusion * mechanisms for IPC etc, etc, etc are directly visible in most parallel programs. Although we typically write in a "high level" language like C, we usually have to directly control the hardware [I hate to say it, but it's analogous to writing in assembly language]. "Wouldn't it be nice" if we could write code in a way that vendor specific issues could be hidden from us? The "molecule" technique is a step in this direction. __________WHAT_IS_A_"MOLECULE"?_______________________________________ OVERVIEW OF THE PAPER: A molecule is roughly like a procedure or function. However, along with encapsulating "algorithms", "data types", and "parameters", it also encapsulates "parallel constructs" such as the above distinctions. Apparently the authors felt that the (non-strict) dataflow paradigm is the most flexible (able to be translated into any of the other paradigms), so the highest level "moleculized" program is specified using this form. By "non-strict dataflow", they mean the single assignment rule for _atomic_ objects; ie: if "i" is an integer, it can be assigned a value only once, but if "a" is an array of integers, each element can _individually_ be assigned a value, but only once. No lazy evaluation specified. __________THE_"MODE"_LAYER__________________________________________ They specify a number of other "molecule" paradigms, such as "SIMD", "parallel", "task", and a bunch of others, all of which are roughly like _pseudo_code_ for specifying parallel algorithms (ie: the language doesn't have any mechanisms for allocating processors, that being taken care of by the next pre-processing step). The "dataflow" programs are translated into this "mode" layer by a first pre-processing step. Programs at this level _do_ have individual processes specified (ie: decomposing the program into processes is accomplished by the "dataflow --> mode" preprocessor). A generic form of IPC is also visible at this "mode" layer, which was hidden in the top-most layer. __________THE_"VENDOR_SPECIFIC"_LAYER__________________________________ The next preprocessor ("mode --> vendor-specific") allocates processors, provides mechanisms for mutual exclusion, and generally looks messy. They give an example of what code looks like at this layer for inverting a matrix (coded in the "C" extensions provided by Intel for use on their iPSC hypercube). The iPSC "extended-C" version took 250 lines, the "mode" layer required 44 lines, the top layer required 26 lines. __________MODIFIABILITY_AND_CODE_MAINTENANCE___________________________ Software Engineering concerns such as modifiability were addressed next. Changing the granularity of the parallelism required a _complete_rewrite_ at the iPSC hypercube level, resulting in over 1000 lines of code being changed (progam went from 250 --> 1000 lines). The "mode" layer required changing two subprograms. The topmost (dataflow) layer required changing only two lines. __________FEASIBILTY?__________________________________________________ QUESTION #1: The author's didn't say whether or not these pre-processors have any hope of a feasible implementation. That appears to be an open question which might require _lots_ and _lots_ of research before an answer can be found. Clearly this is a _very_ hard problem, given that reasoning about a Turing Machine's Language (ie: reasoning about what a piece of code _does_) is undecidable. I'm interested in _informed_ opinions (ie: _READ_THE_ARTICLE_FIRST_). What do we think out there? We've all heard the "know-it-alls" who say "Can't be done" before the problem is even well studied. I hope we're all mature enough to realize that this article is a _first_ step. Let's not bash them for what they _didn't_ do; let's discuss what they _did_ do. __________SOME_OTHER_DISCUSSION_QUESTIONS______________________________ They show a diagram (Fig 2, page 594) demonstrating how the "dataflow" specification can be translated into {Multicomputers, Multiprocessors, and Array processors}. These include: * Multicomputers: {iPSC hypercube, NCUBE, FPS T-series, Transputer/Occam} * Multiprocessors: {Cray X-MP/FORTRAN, Alliant FX/8, Cyber/FORTRAN} * Array processors: {MPP/Pascal, Connection-Machine/C*} QUESTION #2: Is this _broad_ range feasible? Please discuss only after understanding the "mode" layer pre-compilation phase -- ie: if it's not feasible, why not? Is the "dataflow --> mode" preprocessing step unfeasible? Is the "task-type --> Cray-X-MP" preprocessing step unfeasible? QUESTION #3: Another discussion question involves the relative _efficiency_ of the generated code. This is going to open up Pandora's Box, I'm sure, but it needs to be (politely) discussed. Code that's generated by a machine is (probably) never going to be as "efficient" as code generated by a (skillful) human. This is the old high-level vs. assembly language question, but applied to parallelization. QUESTION: In 20 years, are "they" going to look back at "us" and say what we say about the "machine language hackers" of yesteryear? Ie: Are they going to say "Why in the world did they worry so much about wasting a few clock cycles? Why did they worry about squeezing the last possible ounce of parallelism out of a piece of hardware?". Or are they going to say the opposite: "Why did they ever even _want_ a language specification which is independent of the parallelization paradigm?" QUESTION #4: Are parallel languages headed towards levels that are higher than what we now call "high level" languages? Ie: Given that we are/will-be running our stuff on supercomputers, should we _use_ the resorces of the "super"-computer to generate code from "super"-high-level-language specs? Granted: this is a far sighted question, but what do you think? QUESTION #5: Anyone know of any _other_ work being done in this area (ie: any other methods for "parallelization-schema-independence"?). QUESTION #6: Is "modified-dataflow" the best choice for the highest-level? Is it flexible enough to be translated into the other forms? Is "your favorite form" better in terms of translation into Multicomputers, Multiprocessors and Array processor architectures? OTHER QUESTIONS ANYONE? ___________OPINIONS_OPINIONS_OPINIONS_________________________________ OPINION #1: I think it's going to take a _LOT_ of work before the question of feasibility is answerable. I'd say "no it's not feasible _today_, but _may_be_ in the future..." OPINION #2: Both "preprocessing" steps seem _horribly_ difficult. For example, the "mode --> vendor-specific" preprocessing step involves allocation of processors. The authors justify their push toward higher level language specifications based on the notion that "it is extremely difficult to figure out a good allocation" (page 595, left column, 2nd to bottom para). Again, they say that process allocation is a "very difficult task" which is "handled by a precompiler." (page 595, right column, 2nd para). Thus the discussion is inherently self-contradictory: those who say that "the problem is too hard for a machine to do" are admitting that humans have to forever be saddled with this problem -- and they're not willing to look into relieving us of the burden. On the other hand, those who say "it's easy to do by machine" must then answer the question: "if it's so easy, why bother? why not just let humans do it?" OPINION #3: I think that in 20 years "they" will have machines do _LOTS_ more to help generate the code. Assembly language will be _unheard_of_ in practice. Writing code which will only run on a single parallel machine will be considered anathema. As always, this will present efficiency problems. Perhaps these will be solved in the usual "find and streamline the bottlenecks" method. Perhaps "streamlining" will involve hand-coding, or perhaps it will involve "super-parallelizing" the code. Perhaps we'll ("they'll") need profiling tools to "find" the bottlenecks. Or perhaps the machine will generate the code, run it, automatically look for bottlenecks, automatically re-parallelize the offending section, run it again, etc. OPINION #4: I don't know. A "natural language" form (ie: accept English commands and "execute" them) would certainly qualify as a "super"-high- level-language. Where would that leave programmers? Would we all be out of a job? I don't think so. _Someone_ is going to have to write the code which translates English (or whatever) into something the machine can handle! Can you imagine the Software Maintenance problem that would create? Shivver! Let the discussion begin! Marshall -- ________________________________________________________________ Marshall P. Cline ARPA: cline@sun.soe.clarkson.edu ECE Department UseNet: uunet!sun.soe.clarkson.edu!cline Clarkson University BitNet: BH0W@CLUTX Potsdam, NY 13676 AT&T: (315) 268-6591