Path: utzoo!attcan!uunet!cs.utexas.edu!tut.cis.ohio-state.edu!ukma!gatech!hubcap!Marshall
From: cline@sunshine.ece.clarkson.edu (Marshall Cline)
Newsgroups: comp.parallel
Subject: "Molecule: Language Construct for Development Parallel Programs"
Message-ID: <5593@hubcap.clemson.edu>
Date: 24 May 89 20:41:38 GMT
Sender: fpst@hubcap.clemson.edu
Lines: 182
Approved: parallel@hubcap.clemson.edu

In article <5583@hubcap.clemson.edu> budd@mist.cs.orst.edu (Tim Budd) writes:
>...This allows us to generate
>fine grain SIMD (connection-machine style) code by introducing parallelism
>one way, and coarse grain, MIMD (sequent-style) code by introducing
>parallelism in another way.... not
>something the programmer needs to deal with....
>--tim budd, budd@cs.orst.edu

Did ya'all see the article in May's IEEE Transactions Software Engineering?
     "Molecule: A Language Construct for Development of Parallel Programs",
      by Z. Xu & K. Hwang, pages 587-599.

Personally, I'm interested in the concept of allowing _one_ parallel
program to compile on _many_ hardware platforms regardless of the
hardware's underlying mechanisms.  I'd like to discuss this issue if
others are also interested.

__________MOTIVATION:_HARDWARE_INDEPENDENCE__________________________
Up to now, we've had to re-write code to port it from one vendor's
parallelization mechanisms to another's.  Issues such as:
        * SIMD vs MIMD
        * fine- vs coarse-grained
        * loosely- vs tightly-coupled
        * mechanisms for mutual exclusion
        * mechanisms for IPC
        etc, etc, etc
are directly visible in most parallel programs.  Although we typically
write in a "high level" language like C, we usually have to directly
control the hardware [I hate to say it, but it's analogous to writing
in assembly language].

"Wouldn't it be nice" if we could write code in a way that vendor
specific issues could be hidden from us?  The "molecule" technique is
a step in this direction.

__________WHAT_IS_A_"MOLECULE"?_______________________________________
OVERVIEW OF THE PAPER: A molecule is roughly like a procedure or function.
However, along with encapsulating "algorithms", "data types", and
"parameters", it also encapsulates "parallel constructs" such as the
above distinctions.  Apparently the authors felt that the (non-strict)
dataflow paradigm is the most flexible (able to be translated into any
of the other paradigms), so the highest level "moleculized" program is
specified using this form.  By "non-strict dataflow", they mean the
single assignment rule for _atomic_ objects; ie: if "i" is an integer,
it can be assigned a value only once, but if "a" is an array of integers,
each element can _individually_ be assigned a value, but only once.  No
lazy evaluation specified.

__________THE_"MODE"_LAYER__________________________________________
They specify a number of other "molecule" paradigms, such as "SIMD",
"parallel", "task", and a bunch of others, all of which are roughly like
_pseudo_code_ for specifying parallel algorithms (ie: the language
doesn't have any mechanisms for allocating processors, that being
taken care of by the next pre-processing step).  The "dataflow" programs
are translated into this "mode" layer by a first pre-processing step.
Programs at this level _do_ have individual processes specified (ie:
decomposing the program into processes is accomplished by the "dataflow
--> mode" preprocessor).  A generic form of IPC is also visible at
this "mode" layer, which was hidden in the top-most layer.

__________THE_"VENDOR_SPECIFIC"_LAYER__________________________________
The next preprocessor ("mode --> vendor-specific") allocates processors,
provides mechanisms for mutual exclusion, and generally looks messy.
They give an example of what code looks like at this layer for inverting
a matrix (coded in the "C" extensions provided by Intel for use on their
iPSC hypercube).  The iPSC "extended-C" version took 250 lines, the
"mode" layer required 44 lines, the top layer required 26 lines.

__________MODIFIABILITY_AND_CODE_MAINTENANCE___________________________
Software Engineering concerns such as modifiability were addressed next.
Changing the granularity of the parallelism required a _complete_rewrite_
at the iPSC hypercube level, resulting in over 1000 lines of code being
changed (progam went from 250 --> 1000 lines).  The "mode" layer required
changing two subprograms.  The topmost (dataflow) layer required changing
only two lines.

__________FEASIBILTY?__________________________________________________
QUESTION #1: The author's didn't say whether or not these pre-processors
have any hope of a feasible implementation.  That appears to be an open
question which might require _lots_ and _lots_ of research before an
answer can be found.  Clearly this is a _very_ hard problem, given that
reasoning about a Turing Machine's Language (ie: reasoning about what a
piece of code _does_) is undecidable.

I'm interested in _informed_ opinions (ie: _READ_THE_ARTICLE_FIRST_).
What do we think out there?  We've all heard the "know-it-alls" who say
"Can't be done" before the problem is even well studied.  I hope we're
all mature enough to realize that this article is a _first_ step.  Let's
not bash them for what they _didn't_ do; let's discuss what they _did_ do.

__________SOME_OTHER_DISCUSSION_QUESTIONS______________________________
They show a diagram (Fig 2, page 594) demonstrating how the "dataflow"
specification can be translated into {Multicomputers, Multiprocessors,
and Array processors}.  These include:
  * Multicomputers:   {iPSC hypercube, NCUBE, FPS T-series, Transputer/Occam}
  * Multiprocessors:  {Cray X-MP/FORTRAN, Alliant FX/8, Cyber/FORTRAN}
  * Array processors: {MPP/Pascal, Connection-Machine/C*}

QUESTION #2: Is this _broad_ range feasible?  Please discuss only after
understanding the "mode" layer pre-compilation phase -- ie: if it's not
feasible, why not?  Is the "dataflow --> mode" preprocessing step
unfeasible?  Is the "task-type --> Cray-X-MP" preprocessing step unfeasible?

QUESTION #3: Another discussion question involves the relative _efficiency_
of the generated code.  This is going to open up Pandora's Box, I'm sure,
but it needs to be (politely) discussed.  Code that's generated by a
machine is (probably) never going to be as "efficient" as code generated
by a (skillful) human.  This is the old high-level vs. assembly language
question, but applied to parallelization.  QUESTION: In 20 years, are
"they" going to look back at "us" and say what we say about the "machine
language hackers" of yesteryear?  Ie: Are they going to say "Why in the
world did they worry so much about wasting a few clock cycles?  Why did
they worry about squeezing the last possible ounce of parallelism out of
a piece of hardware?".  Or are they going to say the opposite: "Why did
they ever even _want_ a language specification which is independent of
the parallelization paradigm?"

QUESTION #4: Are parallel languages headed towards levels that are higher
than what we now call "high level" languages?  Ie: Given that we are/will-be
running our stuff on supercomputers, should we _use_ the resorces of the
"super"-computer to generate code from "super"-high-level-language specs?
Granted: this is a far sighted question, but what do you think?

QUESTION #5: Anyone know of any _other_ work being done in this area (ie:
any other methods for "parallelization-schema-independence"?).

QUESTION #6: Is "modified-dataflow" the best choice for the highest-level?
Is it flexible enough to be translated into the other forms?  Is "your 
favorite form" better in terms of translation into Multicomputers,
Multiprocessors and Array processor architectures?

OTHER QUESTIONS ANYONE?

___________OPINIONS_OPINIONS_OPINIONS_________________________________
OPINION #1: I think it's going to take a _LOT_ of work before the question
of feasibility is answerable.  I'd say "no it's not feasible _today_,
but _may_be_ in the future..."

OPINION #2: Both "preprocessing" steps seem _horribly_ difficult.  For
example, the "mode --> vendor-specific" preprocessing step involves
allocation of processors.  The authors justify their push toward higher
level language specifications based on the notion that "it is extremely
difficult to figure out a good allocation" (page 595, left column, 2nd
to bottom para).  Again, they say that process allocation is a "very
difficult task" which is "handled by a precompiler." (page 595, right
column, 2nd para).

Thus the discussion is inherently self-contradictory: those who say that
"the problem is too hard for a machine to do" are admitting that humans
have to forever be saddled with this problem -- and they're not willing
to look into relieving us of the burden.  On the other hand, those who
say "it's easy to do by machine" must then answer the question: "if it's
so easy, why bother? why not just let humans do it?"

OPINION #3: I think that in 20 years "they" will have machines do _LOTS_
more to help generate the code.  Assembly language will be _unheard_of_
in practice.  Writing code which will only run on a single parallel machine
will be considered anathema.  As always, this will present efficiency
problems.  Perhaps these will be solved in the usual "find and streamline
the bottlenecks" method.  Perhaps "streamlining" will involve hand-coding,
or perhaps it will involve "super-parallelizing" the code.  Perhaps we'll
("they'll") need profiling tools to "find" the bottlenecks.  Or perhaps the
machine will generate the code, run it, automatically look for bottlenecks,
automatically re-parallelize the offending section, run it again, etc.

OPINION #4: I don't know.  A "natural language" form (ie: accept English
commands and "execute" them) would certainly qualify as a "super"-high-
level-language.  Where would that leave programmers?  Would we all be out
of a job?  I don't think so.  _Someone_ is going to have to write the code
which translates English (or whatever) into something the machine can
handle!  Can you imagine the Software Maintenance problem that would
create?  Shivver!

Let the discussion begin!
Marshall

--
	________________________________________________________________
	Marshall P. Cline	ARPA:	cline@sun.soe.clarkson.edu
	ECE Department		UseNet:	uunet!sun.soe.clarkson.edu!cline
	Clarkson University	BitNet:	BH0W@CLUTX
	Potsdam, NY  13676	AT&T:	(315) 268-6591