Path: utzoo!mnetor!uunet!lll-winken!lll-tis!mordor!lll-lcc!well!shf From: shf@well.UUCP (Stuart H. Ferguson) Newsgroups: comp.software-eng Subject: Re: Cynic's Guide to Software Engineering, part 3 Message-ID: <5752@well.UUCP> Date: 20 Apr 88 22:41:06 GMT References: <1950@rtech.UUCP> <945@nuchat.UUCP> Reply-To: shf@well.UUCP (Stuart H. Ferguson) Distribution: na Organization: The Blue Planet Lines: 316 Keywords: FORTRAN, Languages, Scientific programming Article 474 of comp.software-eng: >From: steve@nuchat.UUCP (Steve Nuchia) (Sorry about the length. I guess I had a lot to say ...) In a previous article, Steve Nuchia talks about Fortran: >From article <1950@rtech.UUCP>, by daveb@llama.rtech.UUCP ...: >> ... It is hard for me to believe that, for sake of >> argument, FORTRAN is incapable or even very seriously flawed in its >> ability to model physics problems compared to any of the likely >> alternatives. >FORTRAN is a very nice language for arranging to have arithmetic >FORmulas evaluated, andit is even ok (in its present form) at >organizing a moderately complex sequence of calculations. First of all, I like Fortran. No really! I've been working for about four years as a programmer with a group of solar physicists writing image processing and data analysis applications. I have a B.A. in computer science from UC Berkeley, and am *fluent* in C and Pascal and a couple of varieties of assembly language, as well as being fairly well versed in Lisp and other more abstruse languages. I wrote a couple of image processing programs in Pascal once. I was using a recursive algorithm, so I thought it would make sense to use a language that supported recursion (everybody knows you can't do recursion in Fortran, right?). It didn't work out. Pascal turns out just not to be a good language for this kind of thing -- I couldn't say exactly why. I found it easier to implement the algorithm in Fortran using an array as an explicit stack so that the recursive algorithm became iterative. I know Pascal very well too, having written quite a few things in Pascal including a recursive decent parser and other goodies. The Fortran version of the image processing algorithm has gone through several phases of refinement while the Pascal version has sat around not doing much of anything. I have, over the course of the past two years, written a tremendous volume of Fortran code, including a 'lexx'-like, a 'yacc'-like and a 'make'-like program (VMS didn't have these utilities, so I wrote my own in Fortran). Besides just writing image processing algorithms, I have also been concerned with the types of user interface issuses that Nuchia addresses. >... In my experience >the text of most "scientific" programs begin life as some small >amount of FORTRAN embodying some algorithm - maybe as much as a >few thousand lines. ... An image processing algorithm can often be expressed as ONE line ... >Now, this little program starts to be _used_ for something. It becomes >"supported", and starts to grow. The algorithm doesn't grow. What >grows is the data management and user interface cruft around the >central algorithm. Since the whole wad is still in FORTRAN the >maintenance programmers have littered the algorithmic code with >flags and such to support all the new features, but by this time >the data management and user interface code outweighs the "scientific" >code by ten to one or better. This is a very good observation. User interface isn't much of an issue for the scientists I work with, however, so that is not where the code is. Most scientists are happy with something as simple as: ENTER FILENAME: A scientist's idea of a fancy user interface is being able to press return at the prompt to exit the program. (This is a stereotype, to be sure.) Actually, I have a lot of respect for the scientists here, and many of them are very good programmers as well, but they'll usually only work on user interface as a last resort. The place I find the most 'cruft' is in what I would call "I/O." I think Nuchia's observation may be a bit of a simplification, and I have some further refinement to suggest. What happens to me is this: I write a little program to try some new data analysis technique. When I show the results to the scientists they get all excited and tell me to try it on other data that we have. So, now I need to go back and modify the program to handle different types of data than I originally designed it for, but this is no big deal, because I'm just making the program better and more general so I go and do it gladly. Here's where the trouble starts. Let's say I run the program on the other data and the results are tantalizing but not as good as the scientists had hoped. We then enter the "What If" Loop, where the scientists think up all sorts of various complicated and often contrived ways to approach the problem to try and make the results better. Someone will come up with a good "what if we try it this way ..." type suggestion and I'll go off and muck with my code, try it, and show them the results. Then they say, "Hmmm, that's not quite it. What if instead of _this_ we do _that_?" and off I go again into the code to try that idea. The result is a real nightmare. What I end up with is an algorithm that is the result of many incremental and often temporary (or kludgy) changes. Ideally the program would eventually work right and then I could go back and recode it correctly, but it rarely seems to happen that way. Usually what happens is the idea gets abandoned until six months later when we get a new data set and someone suggests that the idea we were working on six months ago might work well on this data, and I dig the code up and bang on it again. >Sure, the program spends most of _its_ time in the scientific code, >but where are your programmers spending _their_ time? ... In a type of *interactive* software development cycle for which few languages are well adapted. >Nuchia's law: 90% of any production program is doing data > managment and user interface for the 10% that is > doing the real work. This is a good start, but what exactly is meant by a "production" program? I also don't like the implication that user interface cannot be considered "real" work. What about something like a CAD program which is nothing BUT user interface? >... how good of a language is FORTRAN for >data management? Pretty terrible. It just doesn't have any >of the semantics you need. Sure, you can kluge around it, >but why? ... and in the same vein ... >You wouldn't want to write the UI library in FORTRAN, but >thankfully I don't think there are very manu people left >who would insist that you do so. It's easy for us software types to blame the programming language; I mean, what could be worse than Fortran, right? But I remind you that I wrote 'lexx,' 'yacc' and 'make' in Fortran with very little trouble. This is Vax Fortran, an extension of FORTRAN-77 with 31 character identifiers, structured control flow and structured data types, among other things. I've done recursion, linked lists, string manipulation and even dynamic memory allocation in Fortran. I'm not pushing Vax Fortran as some kind of True and Great language, but I AM suggesting that the source of some of the problems Steve addresses may be a result of more than JUST the choice of programming language. The "scientific" programs which start small and grow, such as the ones that Steve refers to, are rarely planned. Rather they evolve out of the kind of interactive development (i.e. hacking) that I described. For cases where the program was actually designed in advance, such as my 'lexx,' 'yacc,' and 'make' utilities, Fortran was a workable choice. Some might argue that bad C code is better than good Fortran code, but I don't find this to be the case, and I strongly suspect that those who say this don't really know how to recognize a good Fortran program. Steve Nuchia also talks about user interfaces: >User interfaces, if implemented properly, just consist of >a large state machine which calls a giant UI library. The ----- >state machine can be written in FORTRAN or BASIC or SWAHILI >or whatever you want, as long as some kind of conventional >structure is used - the nodes of the state graph should >all follow a stereotyped format to a greater or lesser >degree. ... >What's the solution? Let your scientific programmers code in >FORTRAN. They're getting their work done, and you've got enough >work to do without going on a crusade to "save" them. And of >course you get to take advantage of the available optimization >technology for the long-running core of the programs. But >make your maintenance programmers learn and use a modern >language; pick one with a good interface to your FORTRAN >environment. This has proven to be a good general model to work from. We have done this in our lab to a certain extent at various levels with varying degrees of success. Three of our more successful approaches validate Nuchia's model: 1) User interface libraries In designing libraries, there's a parallel issue to user interface -- that of programmer interface. There are really a lot of wonderful libraries out there which do a lot of wonderful things, but many have no concern for what the programmer sees. Good examples are some of the really big graphics libraries. They are really powerful and really general and really can do lots of terrific things, but it takes really a LOT of calls to do even the simplest thing. You sometimes have to learn every weird formalization that the program uses internally in order to do something as simple as draw a line. Experience with these tends to make me shudder when I see someone refer to a building a GIANT user interface library. The best library interface, as well as the most useful user interface library for Fortran programs I've even seen happens to be the same library. It is called GIRL (Generalized Input Routine Library) and the basic programmer interface consists of ONE subroutine call. The programmer can do something like: ALPHA = 1.0 COUNT = 10 CALL GIRL('Enter Alpha, Count',,,'RI',ALPHA,COUNT) and the result would be a friendly prompt: Enter Alpha, Count [1.0,10]: The preset values for the variables get displayed in []'s, and the user can press return to get those as defaults or can type ",2" to leave ALPHA alone and set COUNT to 2. The GIRL subroutine can take any number of arguments and determines their type from a string (the 'RI' argument in the example above). This very simple interface makes it *easier* to use GIRL than the equivalent 2-4 lines of Fortran, and you get defaults and a uniform input method throughout your program. But GIRL also has a whole lot of other features such as on-line help, super-defaults, backtracking and others, but all of that is hidden behind a deceptively simple programmer interface. The other features get accessed by adding more parameters to the GIRL call, or by calling other routines. It's organized, however, so that the programmer only needs to learn about those features he's interested in when he's interested in them. 2) Fortran pre-processor We have been sucessful at separating the functions of user interface and data I/O from the computational part of the program by using a Fortran pre-processor. The input to the pre-processor is a language which is a super-set of Vax Fortran with the basic Fortran data types extended to include data types specific to image processing. Normally, in order to read an image into memory to work on it, the Fortran programmer would have to get the filename, open the file, get the filesize and read the data into a static array in the program. Using the pre-processor, the programmer can now get an image into memory by just declaring a variable which is the file and a variable for the in-memory array and just assign one to the other, as in: IMEXT IMIN !External Image IMIN, the input image IMWIN A !In-memory Image A, the data array ... A = IMIN !Read file into A The assignment statement would be expanded by the pre-processor into calls to the appropriate library routines, and the array for the image would be dymanically allocated and loaded. Even getting the filename from the user is part of the library. As result, what would have been a lot of 'cruft' for reading an image is now a simple expression which much more eloquently expresses what the programmer is doing. The pre-processor provides other capabilities to do simple image manipulations easily, but since the output is Fortran code, the main body of the algorithm can pass through unchanged. There have been a number of useful side effects. Before using this pre-processor, the "standard" image file format was very simple and limited. All images contained the same data type and had a fixed maximum size of 256 by 256 pixels because it was easy to write code that would read them, and the files had to be read into fixed sized arrays in the program meaning that they had to have a maximum size. Since the pre-processor makes file I/O transparent and makes it possible to have arbitrarily sized arrays, we have started using a more sophisticaed image file format allowing very large arrays and arbitrary data types. For the same reasons, users are also starting to see a uniform style of interface. By the way -- I wrote the pre-processor in Fortran ;-). 3) Interactive shell Another approach we've used is to build an interactive shell around a core of basic operations which does all the "dirty work" of managing data and much of the user I/O. The core operations can be coded in Fortran or assembly or whatever you like, and specialial core algorithms can be spliced in with a bit of software "glue." Since they become just functions called from within the shell, they don't need to be concerned with data I/O or user interface -- they just get data from the shell, process it, and return the results to the shell for the user to display or continue working on. In the shell we use (Ana, written by Dr. Richard Shine and associates) you can do many simple operations right in the shell. For example, to subtract the mean from an image stored in the variable X and display the result on a TV monitor, you could do: ANA> X=X-mean(X) ANA> TV,X Although the shell could be relativly simple, the one we use is a complete programming language similar to Fortran (but without GOTO's, thank goodness :). A suitable such shell is IDL by Research Systems Inc., although I don't know if you can add your own code to it. We use a home grown design which allows us to modify it any way we wish, although it does tend to be less stable than a commercial product. One real advantage is that this approach facilitates the interactive development cycle I mentioned earlier. Since the shell is interactive and results are calculated and displayed immediately on pressing return, it is much easier to "fool around," and try different approaches in order to get a certain result. Once you get the algorithm working as a shell script, it is a fairly trivial process to code it up (using the pre-processor) into a self-contained Fortran program, or even as a new built-in function in the shell language. >I have to admit that I haven't seen this tried on a significant >scale. The ideas presented above have been forming in my mind >for a couple of years, and lately I've had an opportunity to >watch scientists produce programs, confirming much of what I >had thought. Take it for what its worth, and I would appreciate >any evidence supporting or contradicting my positions. >-- >Steve Nuchia | [...] but the machine would probably be allowed no mercy. >uunet!nuchat!steve | In other words then, if a machine is expected to be >(713) 334 6720 | infallible, it cannot be intelligent. - Alan Turing, 1947 Our little lab has done some of the best and most impressive image processing in the field of solar physics in recent years. We've been doing, in a large part, just what Steve Nuchia is suggesting, and I think that this general approach has been a strong force in our success. I don't know if what we have done would work for everyone, but it has certainly worked very well for us. Stuart Ferguson Lockheed Palo Alto Research Lab Research and Development Division Solar and Optical Physics -- Stuart Ferguson (shf@well.UUCP) Action by HAVOC (shf@Solar.Stanford.EDU)