Path: utzoo!mnetor!uunet!lll-winken!lll-tis!mordor!lll-lcc!well!shf
From: shf@well.UUCP (Stuart H. Ferguson)
Newsgroups: comp.software-eng
Subject: Re: Cynic's Guide to Software Engineering, part 3
Message-ID: <5752@well.UUCP>
Date: 20 Apr 88 22:41:06 GMT
References: <1950@rtech.UUCP> <945@nuchat.UUCP>
Reply-To: shf@well.UUCP (Stuart H. Ferguson)
Distribution: na
Organization: The Blue Planet
Lines: 316
Keywords: FORTRAN, Languages, Scientific programming


Article 474 of comp.software-eng:
>From: steve@nuchat.UUCP (Steve Nuchia)

(Sorry about the length.  I guess I had a lot to say ...)

In a previous article, Steve Nuchia talks about Fortran:
>From article <1950@rtech.UUCP>, by daveb@llama.rtech.UUCP ...:
>> ...  It is hard for me to believe that, for sake of
>> argument, FORTRAN is incapable or even very seriously flawed in its
>> ability to model physics problems compared to any of the likely
>> alternatives.
>FORTRAN is a very nice language for arranging to have arithmetic
>FORmulas evaluated, andit is even ok (in its present form) at
>organizing a moderately complex sequence of calculations.

First of all, I like Fortran.  No really!  I've been working for about 
four years as a programmer with a group of solar physicists writing image
processing and data analysis applications.  I have a B.A. in computer
science from UC Berkeley, and am *fluent* in C and Pascal and a couple
of varieties of assembly language, as well as being fairly well versed
in Lisp and other more abstruse languages. 

I wrote a couple of image processing programs in Pascal once.  I was
using a recursive algorithm, so I thought it would make sense to use a
language that supported recursion (everybody knows you can't do
recursion in Fortran, right?).  It didn't work out.  Pascal turns out
just not to be a good language for this kind of thing -- I couldn't say
exactly why.  I found it easier to implement the algorithm in Fortran
using an array as an explicit stack so that the recursive algorithm
became iterative.  I know Pascal very well too, having written quite a
few things in Pascal including a recursive decent parser and other
goodies.  The Fortran version of the image processing algorithm has gone
through several phases of refinement while the Pascal version has sat
around not doing much of anything. 

I have, over the course of the past two years, written a tremendous 
volume of Fortran code, including a 'lexx'-like, a 'yacc'-like and a 
'make'-like program (VMS didn't have these utilities, so I wrote my own 
in Fortran).  Besides just writing image processing algorithms, I have 
also been concerned with the types of user interface issuses that Nuchia
addresses. 

>... In my experience
>the text of most "scientific" programs begin life as some small
>amount of FORTRAN embodying some algorithm - maybe as much as a
>few thousand lines.  ...

An image processing algorithm can often be expressed as ONE line ...

>Now, this little program starts to be _used_ for something.  It becomes
>"supported", and starts to grow.  The algorithm doesn't grow.  What
>grows is the data management and user interface cruft around the
>central algorithm.  Since the whole wad is still in FORTRAN the
>maintenance programmers have littered the algorithmic code with
>flags and such to support all the new features, but by this time
>the data management and user interface code outweighs the "scientific"
>code by ten to one or better.

This is a very good observation.  User interface isn't much of an issue
for the scientists I work with, however, so that is not where the code
is.  Most scientists are happy with something as simple as: 

	ENTER FILENAME: 

A scientist's idea of a fancy user interface is being able to press
return at the prompt to exit the program.  (This is a stereotype, to be
sure.)  Actually, I have a lot of respect for the scientists here, and
many of them are very good programmers as well, but they'll usually only
work on user interface as a last resort.  The place I find the most 
'cruft' is in what I would call "I/O."

I think Nuchia's observation may be a bit of a simplification, and I
have some further refinement to suggest.  What happens to me is this:  I
write a little program to try some new data analysis technique.  When I
show the results to the scientists they get all excited and tell me to
try it on other data that we have.  So, now I need to go back and modify
the program to handle different types of data than I originally designed
it for, but this is no big deal, because I'm just making the program
better and more general so I go and do it gladly. 

Here's where the trouble starts.  Let's say I run the program on the
other data and the results are tantalizing but not as good as the 
scientists had hoped.  We then enter the "What If" Loop, where the 
scientists think up all sorts of various complicated and often contrived
ways to approach the problem to try and make the results better. 
Someone will come up with a good "what if we try it this way ..." type
suggestion and I'll go off and muck with my code, try it, and show them
the results.  Then they say, "Hmmm, that's not quite it.  What if
instead of _this_ we do _that_?" and off I go again into the code to try
that idea. 

The result is a real nightmare.  What I end up with is an algorithm that
is the result of many incremental and often temporary (or kludgy)
changes.  Ideally the program would eventually work right and then I
could go back and recode it correctly, but it rarely seems to happen
that way.  Usually what happens is the idea gets abandoned until six
months later when we get a new data set and someone suggests that the
idea we were working on six months ago might work well on this data, and
I dig the code up and bang on it again. 

>Sure, the program spends most of _its_ time in the scientific code,
>but where are your programmers spending _their_ time?   ...

In a type of *interactive* software development cycle for which few
languages are well adapted.

>Nuchia's law:  90% of any production program is doing data
>        managment and user interface for the 10% that is
>        doing the real work.

This is a good start, but what exactly is meant by a "production"
program?  I also don't like the implication that user interface cannot
be considered "real" work.  What about something like a CAD program
which is nothing BUT user interface? 

>... how good of a language is FORTRAN for
>data management?  Pretty terrible.  It just doesn't have any
>of the semantics you need.  Sure, you can kluge around it,
>but why?
... and in the same vein ...
>You wouldn't want to write the UI library in FORTRAN, but
>thankfully I don't think there are very manu people left
>who would insist that you do so.

It's easy for us software types to blame the programming language; I
mean, what could be worse than Fortran, right?  But I remind you that I
wrote 'lexx,' 'yacc' and 'make' in Fortran with very little trouble. 
This is Vax Fortran, an extension of FORTRAN-77 with 31 character
identifiers, structured control flow and structured data types, among
other things.  I've done recursion, linked lists, string manipulation
and even dynamic memory allocation in Fortran.  I'm not pushing Vax
Fortran as some kind of True and Great language, but I AM suggesting
that the source of some of the problems Steve addresses may be a result
of more than JUST the choice of programming language. 

The "scientific" programs which start small and grow, such as the ones 
that Steve refers to, are rarely planned.  Rather they evolve out of the
kind of interactive development (i.e. hacking) that I described.  For
cases where the program was actually designed in advance, such as my
'lexx,' 'yacc,' and 'make' utilities, Fortran was a workable choice. 
Some might argue that bad C code is better than good Fortran code, but I
don't find this to be the case, and I strongly suspect that those who
say this don't really know how to recognize a good Fortran program. 


Steve Nuchia also talks about user interfaces:

>User interfaces, if implemented properly, just consist of
>a large state machine which calls a giant UI library.  The
                                     -----
>state machine can be written in FORTRAN or BASIC or SWAHILI
>or whatever you want, as long as some kind of conventional
>structure is used - the nodes of the state graph should
>all follow a stereotyped format to a greater or lesser
>degree.
...
>What's the solution?  Let your scientific programmers code in
>FORTRAN.  They're getting their work done, and you've got enough
>work to do without going on a crusade to "save" them.  And of
>course you get to take advantage of the available optimization
>technology for the long-running core of the programs.  But
>make your maintenance programmers learn and use a modern
>language; pick one with a good interface to your FORTRAN
>environment.

This has proven to be a good general model to work from.  We have done
this in our lab to a certain extent at various levels with varying 
degrees of success.  Three of our more successful approaches validate 
Nuchia's model:

1) User interface libraries

In designing libraries, there's a parallel issue to user interface --
that of programmer interface.  There are really a lot of wonderful
libraries out there which do a lot of wonderful things, but many have no
concern for what the programmer sees.  Good examples are some of the
really big graphics libraries.  They are really powerful and really
general and really can do lots of terrific things, but it takes really a
LOT of calls to do even the simplest thing.  You sometimes have to learn
every weird formalization that the program uses internally in order to
do something as simple as draw a line.  Experience with these tends to
make me shudder when I see someone refer to a building a GIANT user
interface library. 

The best library interface, as well as the most useful user interface
library for Fortran programs I've even seen happens to be the same
library.  It is called GIRL (Generalized Input Routine Library) and the
basic programmer interface consists of ONE subroutine call.  The 
programmer can do something like:

	ALPHA = 1.0
	COUNT = 10
	CALL GIRL('Enter Alpha, Count',,,'RI',ALPHA,COUNT)

and the result would be a friendly prompt:

	Enter Alpha, Count [1.0,10]: 

The preset values for the variables get displayed in []'s, and the user 
can press return to get those as defaults or can type ",2" to leave 
ALPHA alone and set COUNT to 2.  The GIRL subroutine can take any 
number of arguments and determines their type from a string (the 'RI'
argument in the example above). 

This very simple interface makes it *easier* to use GIRL than the
equivalent 2-4 lines of Fortran, and you get defaults and a uniform
input method throughout your program.  But GIRL also has a whole lot of
other features such as on-line help, super-defaults, backtracking and
others, but all of that is hidden behind a deceptively simple programmer
interface.  The other features get accessed by adding more parameters to
the GIRL call, or by calling other routines.  It's organized, however,
so that the programmer only needs to learn about those features he's
interested in when he's interested in them. 

2) Fortran pre-processor

We have been sucessful at separating the functions of user interface and
data I/O from the computational part of the program by using a Fortran
pre-processor.  The input to the pre-processor is a language which is a
super-set of Vax Fortran with the basic Fortran data types extended to
include data types specific to image processing.  Normally, in order to
read an image into memory to work on it, the Fortran programmer would
have to get the filename, open the file, get the filesize and read the
data into a static array in the program. Using the pre-processor, the
programmer can now get an image into memory by just declaring a variable
which is the file and a variable for the in-memory array and just assign
one to the other, as in: 

	IMEXT IMIN	!External Image IMIN, the input image
	IMWIN A		!In-memory Image A, the data array
	...
	A = IMIN	!Read file into A

The assignment statement would be expanded by the pre-processor into 
calls to the appropriate library routines, and the array for the image 
would be dymanically allocated and loaded.  Even getting the filename 
from the user is part of the library.  As result, what would have been a
lot of 'cruft' for reading an image is now a simple expression which 
much more eloquently expresses what the programmer is doing.  The 
pre-processor provides other capabilities to do simple image
manipulations easily, but since the output is Fortran code, the main
body of the algorithm can pass through unchanged. 

There have been a number of useful side effects.  Before using this
pre-processor, the "standard" image file format was very simple and
limited.  All images contained the same data type and had a fixed
maximum size of 256 by 256 pixels because it was easy to write code that
would read them, and the files had to be read into fixed sized arrays in
the program meaning that they had to have a maximum size.  Since the
pre-processor makes file I/O transparent and makes it possible to have
arbitrarily sized arrays, we have started using a more sophisticaed
image file format allowing very large arrays and arbitrary data types.
For the same reasons, users are also starting to see a uniform style of
interface. 

By the way -- I wrote the pre-processor in Fortran ;-).

3) Interactive shell

Another approach we've used is to build an interactive shell around a 
core of basic operations which does all the "dirty work" of managing 
data and much of the user I/O.  The core operations can be coded in 
Fortran or assembly or whatever you like, and specialial core algorithms
can be spliced in with a bit of software "glue."  Since they become just
functions called from within the shell, they don't need to be concerned
with data I/O or user interface -- they just get data from the shell,
process it, and return the results to the shell for the user to display
or continue working on.  In the shell we use (Ana, written by Dr.
Richard Shine and associates) you can do many simple operations right in
the shell.  For example, to subtract the mean from an image stored in
the variable X and display the result on a TV monitor, you could do: 

	ANA> X=X-mean(X)
	ANA> TV,X

Although the shell could be relativly simple, the one we use is a
complete programming language similar to Fortran (but without GOTO's, 
thank goodness :).  A suitable such shell is IDL by Research Systems
Inc., although I don't know if you can add your own code to it.  We use
a home grown design which allows us to modify it any way we wish,
although it does tend to be less stable than a commercial product. 

One real advantage is that this approach facilitates the interactive
development cycle I mentioned earlier.  Since the shell is interactive
and results are calculated and displayed immediately on pressing return,
it is much easier to "fool around," and try different approaches in
order to get a certain result.  Once you get the algorithm working as a 
shell script, it is a fairly trivial process to code it up (using the 
pre-processor) into a self-contained Fortran program, or even as a new 
built-in function in the shell language.

>I have to admit that I haven't seen this tried on a significant
>scale.  The ideas presented above have been forming in my mind
>for a couple of years, and lately I've had an opportunity to
>watch scientists produce programs, confirming much of what I
>had thought.  Take it for what its worth, and I would appreciate
>any evidence supporting or contradicting my positions.
>-- 
>Steve Nuchia        | [...] but the machine would probably be allowed no mercy.
>uunet!nuchat!steve  | In other words then, if a machine is expected to be
>(713) 334 6720      | infallible, it cannot be intelligent.  - Alan Turing, 1947

Our little lab has done some of the best and most impressive image 
processing in the field of solar physics in recent years.  We've been
doing, in a large part, just what Steve Nuchia is suggesting, and I
think that this general approach has been a strong force in our success.
I don't know if what we have done would work for everyone, but it has
certainly worked very well for us. 

	Stuart Ferguson
	Lockheed Palo Alto Research Lab
	Research and Development Division
	Solar and Optical Physics
-- 
		Stuart Ferguson		(shf@well.UUCP)
		Action by HAVOC		(shf@Solar.Stanford.EDU)