Xref: utzoo alt.folklore.computers:7689 comp.unix.internals:1228 comp.misc:10737
Path: utzoo!utgpu!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!julius.cs.uiuc.edu!apple!rutgers!rochester!kodak!uupsi!sunic!news.funet.fi!funic!fuug!demos!avg
From: avg@hq.demos.su (Vadim G. Antonov)
Newsgroups: alt.folklore.computers,comp.unix.internals,comp.misc
Subject: Re: Software Obesity (was Re: Jargon file v2.1.5)
Message-ID: <1990Dec3.173607.9022@hq.demos.su>
Date: 3 Dec 90 17:36:07 GMT
References: <1990Nov30.172512.5282@sctc.com> <O7Y77DB@xds13.ferranti.com> <PST.90Dec1131440@ack.Stanford.EDU> <1990Dec2.202402.21977@decuac.dec.com>
Organization: DEMOS, Moscow, USSR
Lines: 174


I can't agree more with the expressive article by Marcus J. Ranum
<mjr@hussar.dco.dec.com>. I like to express my attitude to software
dumps - I think the cost of finding and learning of appropriate
"feature" for solving a particular problem is much higher than
writing the program from scratch in "modern" commercial systems.
After reaching this point in complexity growth a system will collapse
under load of zillions of new misfeatures.

The theory I used to discuss on my lectures is:


HOW TO CREATE A DEAD SYSTEM


There are *three* ways "traditional" systems used to grow:

1) Packages

The old, dusty approach - if you have a problem you write a program
to solve *this* problem. If the problem is a bit more complex than
multiplying two 10x10 matrices you'd probably write a "package" equipped
with screen-oriented input, form generator, some bells and some whistles.
OK, you've made a cuspy package, let's call it "A". Some other guys have
done another package, "B", for a different task. OK, some time passed and
now you have to pass data from A to B in order to solve "joint" problem.

You can write a converter from data in format A into data in format B:

	A -> Conv -> B

in this case Conv tends to be a separate package and often it's less trivial
task than to solve the whole problem. In such case you have to design
a completely new package "A+B". In both cases you have *a new* package
for a minor increase in functionality. As you can see the curve


complexity     *
	|      *
	|     *
	|    *
	|  **
	***
	+------------- functionality

is exponential - and the life time of usable system is really small.
Examples of package-oriented systems are: IBM OS/370, Miscrosoft MS-DOS, etc.

2) Integrated Systems

The second way is to incorporate various kinds of functionality into
a single super-package (so called "integrated system"). This method
allows a desiger to avoid duplicating functions but tends to build
huge, unmanageable (and undebugable) programs. Moreover, such systems
practically does not allow users to upgrade and transform their
environment to their needs. As a result designers of such systems
make users to follow pre-defined paths what makes such systems *useless*
for solving *new* problems. Needless to say such systems could satisfy
only suits. The other source of limitations is the physical resources
of computers - try to imagine one which could keep the whole Unix
including all utilites in RAM :-) Unfortunately I-don't-want-to-think-but-
-I-want-to-use-computer user population is a very attractive target for
integrated systems and I think we'll never see the death of this approach.
Complexity of a single integrated system is limited by humans' stupidity.
Examples: Lotus 1-2-3, EMACS, Framework, dBASE.

3) Nested Environments

The way of Unix. Idea is to split functionality to small, complete
units and to use these units to build the next layer of system environment:

     /-----------------------------\     The advantage of this approach
    /         user utilites         \    is obvious - you need not reimplement
   /   /-------------------------\   \   functions already existing in lower
      /  extended set of utilites \      layers. So the complexity grow with
	    /----------------\           (appr.) the same speed as functiona-
	   /    basic system  \          lity, at least in theory.
	  /   /------------\   \
	 |   /  C-compiler  \   |        complexity
	 |  |    /------\    |  |           |        **
	 |  |   | kernel |   |  |           |      **
	 |  |    \------/    |  |           |    **
	 |   \   C library  /   |           |  **
	  \   \------------/   /            |**
	   \     utilites     /             +-------------- functionality
	    \----------------/

Practically, nobody can design a "perfect" internal layers, thus the
development of Unix includes permanent changes in inner layers like
changes in C library, introducing new systems calls, etc. The catch here
is that if something was injected into inner circle you'll never get
rid of it - or you'll have a danger of loss of compatibility. Moreover,
if the system grow in several different places you'll get several
different configurations of layers. Joining environments is more or less
mechanical procedure and it's quite tempting to fall into it under the
pressure of suits. The result of permanent mindless joining efforts is
the huge dinosaurs still keeping the old name Unix. If you want to
write resonably portable program you should keep yourself as far as
possible from all novations; unfortunately creators of System V made
it impossible to use Unix v7 as a common denominator.


WHAT SHALL WE DO?

Unix is surely died, it collapsed under the load of infinite improvements,
it's just another victim of feature-making; and we have to begin from
scratch. I'll try to observe some possible ways to avoid complexity traps.
I don't think the each of listed system-rejuvenation :-) remedies is
the panacea from all troubles of life, but still:

1) Objects & Reusability

I don't want to discuss the advantages of object-oriented programming here,
they're well-known and discussed up beyond all recognition.

2) Environment Control

We should have a mean to track *all* consequences of introducing/changing/
removing things into environments. Do you know, for example, what will
happen if you'd remove awk from /bin of Unix v7? I know only few people
who knows that tar will fail at option "u"! Such means of environment
control should be transparent and uniformed for all system parts including
libraries, directories, includes, system calls etc. Merging and
synchronizing of changing environments, keeping tracks of changes, etc
is also a matters of the environment control.

3) Parallel Environments

A system of the new generation should support co-existence of several
different environments (or different versions of an environment) - it'd
make introducing changes sufficiently simplier because old programs
could run in the *old* environment. Of course, virtual machine is not an
appropriate way to do such things; there should be a regular and simple
way to communicate between different environments. Object approach
could provide such a feature and environment control could help to
choose the minimal environment required to run the old program.
A really complicated mindset cannot be limited to one language, now it's
quite obvious, so I think multiple environments could provide much
more flexibility to creative programming.

4) Functional Completeness

IMHO the ultimate weapon against complexity is the design of functionally
complete interfaces to units of a system. If you have one you can be sure
nobody will ever change it - simply because there is no need to do it.
If such a unit is *really* functionally complete you simply could not
invent how to improve it (aside of subjective prefernces or efficiency
reasons). Of course, design of complete systems is much more complicated
task than conventional programming and requires sufficient incresing
of thinking/coding ratio; hm, it's not evidient to suits that the better
programmer writes *less* (and lesser) programs than a bad one (you should
compare results, not source code lines, megabytes, features, hours of
work, what else?) Why we say "he's a hero, he wrote a 100000 lines program!"
instead of "100000 lines? It could be done in dozen thousands!"

Several years ago I decided I ought to try to put my theory in practice
and tried to design a functionally complete set of system calls. The
result was *really* amazing: the full-scale prototype of the kernel
is about 4000 lines of C source code (including comments, huh). As a side
result I've found a way to control non-digital machines in
object-oriented style (really dunno how can it be used ;-). A description
of this system was published in ACM SIGOPS Op. Sys. Review, vol 24, N 3
(July 1990), pp 22-39. Unfortunately I have to earn my family doing much
boring things and had no chance to touch this project for the year.
I'd be very glad if someone would use it. (Don't ask me about sources,
it's a pretty raw mess and do not like idea to show it to public; my
goal was not to write a program but to check my idea; and it surely have
to be redesigned).

Vadim Antonov
DEMOS, Moscow, USSR


PS. Excuse me for my poor English, huh, I'm a Russian after all.


Brought to you by Super Global Mega Corp .com