Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!uwm.edu!cs.utexas.edu!sdd.hp.com!decwrl!nsc!pyramid!athertn!paul
From: paul@athertn.Atherton.COM (Paul Sander)
Newsgroups: comp.sw.components
Subject: Re: What is a reusable software component?
Keywords: software component definition
Message-ID: <27908@athertn.Atherton.COM>
Date: 27 Jul 90 20:39:26 GMT
References: <27705@athertn.Atherton.COM> <82454@tut.cis.ohio-state.edu>
Reply-To: paul@Atherton.COM (Paul Sander)
Organization: Atherton Technology, Sunnyvale, CA
Lines: 185

In article <82454@tut.cis.ohio-state.edu> William F Ogden <ogden@cis.ohio-state.edu> writes:
>Paul Sander [that's me] writes:
[Introductory comments and questions omitted]
>At the risk of getting back into the discussion of why software development
>isn't like the construction of other engineering artifacts, let me suggest
>that, from a reusability perspective, a B-tree facility is rather like
>a circuit board in a computer. You certainly can't argue that it's not an
>electronic component, but it's not a very likely candidate for reuse in
>the next computer you design. 

I prefer to think of code of this sort as being something like integrated
circuits, rather than circuit boards.  They are more abstract in function
than the individual lines of code (as a gate or VLSI circuit is more abstract
in function than the active and passive components that are etched into it),
but less abstract in function than a subsystem or product (as an IC is less
abstract in function than a circuit board or product).  Simple data structures
can be built into more complicated ones in the same way that standard cells
are built into ASICs (Application Specific ICs) with the addition of a certain
amount of "glue" logic.  [Just one man's opinion here]

In any case, when designed well, chips, boards, and (hopefully) software
can be reused in new designs with little or no modification.  The flexibility
of the design and quality of implementation will govern its suitability for
new applications.  It is very common for ICs to be reused in new designs;
it is less common, but certainly not unusual to find circuit boards used in
new product designs.  A data structure that sorts things will likely be used
the next time someone needs to sort things, if they don't have make
substantial changes.

[Regarding my B-tree implementation:]
>>- What are its characteristics that ...               don't make it one?
>
>First, the conceptual interface that it presents to its clients is not
>simple. Clients don't really want to fool around restructuring trees or
>reclaiming storage; they just want a fast storage and retrieval facility.
>Here they are forced to deal with details that shouldn't concern them
>[poor information hiding].

But the client is spared the details of restructuring trees; the tree
collects its own garbage.  The client is forced to manage its own storage,
because the tree can't do an effective job of it anyway; the tree has no
clue as to what kind of stuff the client is storing there.  If it's a bunch
of linked structures, the only way the tree can manage it is to include some
sort of compiler with which a client describes its data, and how to manage
it.  This seems too complicated and error prone.  If the client needs to
store such complex data structures in a tree, then another software component
that maintains those data structures can be used, and objects it creates can
then be stored in the tree.  If both components do a good job managing their
storage, and the client is good about avoiding aliasing problems, then
the storage manages itself in the eyes of the client.

As for the information hiding issue, this was the best I could do with C,
since C doesn't have the data abstraction tools that, say, Modula-2 and Ada
have.  How can it be improved within C's constraints?

>Second, the client interface is inadequately specified. 
>Descriptions of the various B-tree operations are supplied only in English, 
>which is inherently either verbose or imprecise. 

Okay, there is a documentation problem.  How can it be improved in a way
that a large number of potential clients can understand?  I know of no good
way to describe a data structure mathematically, and even if I could do it,
few would understand it.

>Global constraints about keeping the tree height uniform, the branchyness from
>dropping too low, the indices in left-to-right order, etc. are not spelled out.

Why does the client need to know these?  All it sees are the interfaces
documented in the man page.  It doesn't even need to know that this data
structure that provides an indexed retrieval system that sorts data is
really a B-tree.  That information is provided to the client as a motivation
for providing a particular parameter when the data structure is created,
and provides a hook where the client might make performance trade-offs.

>Performance characteristics are not indicated. 

The performance characteristics are listed below, and have been added to
the man page.  Here, "m" is the order of the tree, "n" is the number of
items stored in it, and "M" is either O(log m) if n > 7 or O(n) if n <= 7
(this is the point where a binary search begins to outperform a linear
search).

Retrieval:    O(M * log n)
Creation:     O(1)
Destruction:  O(log m) + O(n)
Insertion:    O(m * log n)
Deletion:     O(m * log n)
Traversal:    O(n)

I couldn't measure the performance in any meaningful way.  Performing some
sort of benchmark on this implementation is of only limited value, because
it specifies only how the tree performed on a particular type of machine
with a particular load with a particular set of data having particular
characteristics.

>The type of the items stored in the B-tree is kept commendably general,
>but no mention is made of the fact that correct operation depends on the
>comparison procedure testing a <= relation which must be a TOTAL ordering
>of the items. 

All that the comparison procedure need guarantee is that, given a domain
of input, the output be the same every time the same two items are
compared.  I had thought that the semantics of "strcmp" were defined that
way when I made reference to it in the B-tree man page.  In any case, this
is another simple documentation problem.

>Etc.

What else?

>>- How can it be made better?
>>- Anything else relevant to producing a good reusable component that is reused
>
>Just as with a circuit board, you may try to improve it, but it shouldn't
>be done with an eye to creating a reusable component per se. The circuit
>board contains reusable components, and it is part of a larger reusable
>component. That is also the situation with the B-tree facility.

I don't understand this.  The intent was to build a piece of software that
could be used in a variety of ways to build bigger pieces of software.  It
was in fact created with an eye to creating a reusable component.  Is there
anything to prevent one from building a nested scope symbol table, using
this as the index?  Is there anything to prevent one from using this to
build a temporary index for an external file?

I guess there are several interpretations of what it means to say "improve
it."  It could mean "how can it be made to have more of the qualities that
make it a reusable software component?" or "how can it be made more bug
free?" or "how can it be made faster, without introducing more bugs?" or
"how can its restrictions be reduced?".  My question about "how can it
be improved" in my original posting was really intended to be the first
of these interpretations.  I expect the second and third are ignored in
practice because "it works, why fix it?".  The fourth is usually addressed
in a rewrite that is in turn reusable and may have a different set of
applications that it addresses better than the first.  That doesn't
mean that the earlier implementation is more or less useful (or reusable)
than the second, just that the second is more appropriate for some things.

>The reusable component upon which the B-tree facility should have been based
>is the exploration-tree facility. It is a general purpose facility which
>provides clients with trees and operations for navigating around in them.
>In contrast to the B-tree, it can be used as a component in a wide variety
>of situations.

This is the first I've heard about exploration-tree facilities.  Without
knowing more about it, I can't ask specific questions.  Could you email to
me some additional information and references, please?

>The reusable component which the B-tree facility supports is the
>almost-constant-function facility. Conceptually, it provides a
>function F from an arbitrary totally ordered domain set to an arbitrary
>range set. For all but a few values, F(x) = C, hence the name. A typical
>application would be sparse matrices, where the domain would be pairs
>of integers (lexicogphically ordered), the range would be floating point
>numbers, and C = 0.0. Another might be symbol tables where the domain
>would be character strings, the range would be say type names, and
>C = Undefined_Type. Data bases, polynomials, etc. provide additional
>applications. The <x,y> pairs where F(x) = y # C can be stored in a
>B-tree. Note, however, that a wide variety of other implementations
>are posible (e.g. AVL tree, linked list, hashing, etc.). Analogously,
>a computer architecture can be realized using a variety of chips and
>board layouts.
>
>The exploration-tree and the almost-constant-function present clients
>with simple interfaces which have clean specifications. That's
>surely a necessary feature of a reusable component. 
>Another important property is that, once you discover them, you find
>that they could be used in a wide variety of current applications.
>It's not so clear that you can design reusable components which are
>only useful in future software.

This coincides with my vision of what a reusable component is.  The
B-tree implementation presented is an attempt to provide a data structure
with a simple interface with clean specifications.  Granted, the specification
may not be as precise as some would like, it does present enough information
to make a decision as to whether or not it is adequate for a specific
task, and how to use the interfaces.

Besides the documentation issues mentioned above, how can this be made to
exhibit more of the qualities one associates with reusable software
components?
-- 
Paul Sander        (408) 734-9822  | "Passwords are like underwear," she said,
paul@Atherton.COM                  | "Both should be changed often."
{decwrl,pyramid,sun}!athertn!paul  | -- Bennett Falk in "Mom Meets Unix"