Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!uwm.edu!cs.utexas.edu!sdd.hp.com!decwrl!nsc!pyramid!athertn!paul From: paul@athertn.Atherton.COM (Paul Sander) Newsgroups: comp.sw.components Subject: Re: What is a reusable software component? Keywords: software component definition Message-ID: <27908@athertn.Atherton.COM> Date: 27 Jul 90 20:39:26 GMT References: <27705@athertn.Atherton.COM> <82454@tut.cis.ohio-state.edu> Reply-To: paul@Atherton.COM (Paul Sander) Organization: Atherton Technology, Sunnyvale, CA Lines: 185 In article <82454@tut.cis.ohio-state.edu> William F Ogden writes: >Paul Sander [that's me] writes: [Introductory comments and questions omitted] >At the risk of getting back into the discussion of why software development >isn't like the construction of other engineering artifacts, let me suggest >that, from a reusability perspective, a B-tree facility is rather like >a circuit board in a computer. You certainly can't argue that it's not an >electronic component, but it's not a very likely candidate for reuse in >the next computer you design. I prefer to think of code of this sort as being something like integrated circuits, rather than circuit boards. They are more abstract in function than the individual lines of code (as a gate or VLSI circuit is more abstract in function than the active and passive components that are etched into it), but less abstract in function than a subsystem or product (as an IC is less abstract in function than a circuit board or product). Simple data structures can be built into more complicated ones in the same way that standard cells are built into ASICs (Application Specific ICs) with the addition of a certain amount of "glue" logic. [Just one man's opinion here] In any case, when designed well, chips, boards, and (hopefully) software can be reused in new designs with little or no modification. The flexibility of the design and quality of implementation will govern its suitability for new applications. It is very common for ICs to be reused in new designs; it is less common, but certainly not unusual to find circuit boards used in new product designs. A data structure that sorts things will likely be used the next time someone needs to sort things, if they don't have make substantial changes. [Regarding my B-tree implementation:] >>- What are its characteristics that ... don't make it one? > >First, the conceptual interface that it presents to its clients is not >simple. Clients don't really want to fool around restructuring trees or >reclaiming storage; they just want a fast storage and retrieval facility. >Here they are forced to deal with details that shouldn't concern them >[poor information hiding]. But the client is spared the details of restructuring trees; the tree collects its own garbage. The client is forced to manage its own storage, because the tree can't do an effective job of it anyway; the tree has no clue as to what kind of stuff the client is storing there. If it's a bunch of linked structures, the only way the tree can manage it is to include some sort of compiler with which a client describes its data, and how to manage it. This seems too complicated and error prone. If the client needs to store such complex data structures in a tree, then another software component that maintains those data structures can be used, and objects it creates can then be stored in the tree. If both components do a good job managing their storage, and the client is good about avoiding aliasing problems, then the storage manages itself in the eyes of the client. As for the information hiding issue, this was the best I could do with C, since C doesn't have the data abstraction tools that, say, Modula-2 and Ada have. How can it be improved within C's constraints? >Second, the client interface is inadequately specified. >Descriptions of the various B-tree operations are supplied only in English, >which is inherently either verbose or imprecise. Okay, there is a documentation problem. How can it be improved in a way that a large number of potential clients can understand? I know of no good way to describe a data structure mathematically, and even if I could do it, few would understand it. >Global constraints about keeping the tree height uniform, the branchyness from >dropping too low, the indices in left-to-right order, etc. are not spelled out. Why does the client need to know these? All it sees are the interfaces documented in the man page. It doesn't even need to know that this data structure that provides an indexed retrieval system that sorts data is really a B-tree. That information is provided to the client as a motivation for providing a particular parameter when the data structure is created, and provides a hook where the client might make performance trade-offs. >Performance characteristics are not indicated. The performance characteristics are listed below, and have been added to the man page. Here, "m" is the order of the tree, "n" is the number of items stored in it, and "M" is either O(log m) if n > 7 or O(n) if n <= 7 (this is the point where a binary search begins to outperform a linear search). Retrieval: O(M * log n) Creation: O(1) Destruction: O(log m) + O(n) Insertion: O(m * log n) Deletion: O(m * log n) Traversal: O(n) I couldn't measure the performance in any meaningful way. Performing some sort of benchmark on this implementation is of only limited value, because it specifies only how the tree performed on a particular type of machine with a particular load with a particular set of data having particular characteristics. >The type of the items stored in the B-tree is kept commendably general, >but no mention is made of the fact that correct operation depends on the >comparison procedure testing a <= relation which must be a TOTAL ordering >of the items. All that the comparison procedure need guarantee is that, given a domain of input, the output be the same every time the same two items are compared. I had thought that the semantics of "strcmp" were defined that way when I made reference to it in the B-tree man page. In any case, this is another simple documentation problem. >Etc. What else? >>- How can it be made better? >>- Anything else relevant to producing a good reusable component that is reused > >Just as with a circuit board, you may try to improve it, but it shouldn't >be done with an eye to creating a reusable component per se. The circuit >board contains reusable components, and it is part of a larger reusable >component. That is also the situation with the B-tree facility. I don't understand this. The intent was to build a piece of software that could be used in a variety of ways to build bigger pieces of software. It was in fact created with an eye to creating a reusable component. Is there anything to prevent one from building a nested scope symbol table, using this as the index? Is there anything to prevent one from using this to build a temporary index for an external file? I guess there are several interpretations of what it means to say "improve it." It could mean "how can it be made to have more of the qualities that make it a reusable software component?" or "how can it be made more bug free?" or "how can it be made faster, without introducing more bugs?" or "how can its restrictions be reduced?". My question about "how can it be improved" in my original posting was really intended to be the first of these interpretations. I expect the second and third are ignored in practice because "it works, why fix it?". The fourth is usually addressed in a rewrite that is in turn reusable and may have a different set of applications that it addresses better than the first. That doesn't mean that the earlier implementation is more or less useful (or reusable) than the second, just that the second is more appropriate for some things. >The reusable component upon which the B-tree facility should have been based >is the exploration-tree facility. It is a general purpose facility which >provides clients with trees and operations for navigating around in them. >In contrast to the B-tree, it can be used as a component in a wide variety >of situations. This is the first I've heard about exploration-tree facilities. Without knowing more about it, I can't ask specific questions. Could you email to me some additional information and references, please? >The reusable component which the B-tree facility supports is the >almost-constant-function facility. Conceptually, it provides a >function F from an arbitrary totally ordered domain set to an arbitrary >range set. For all but a few values, F(x) = C, hence the name. A typical >application would be sparse matrices, where the domain would be pairs >of integers (lexicogphically ordered), the range would be floating point >numbers, and C = 0.0. Another might be symbol tables where the domain >would be character strings, the range would be say type names, and >C = Undefined_Type. Data bases, polynomials, etc. provide additional >applications. The pairs where F(x) = y # C can be stored in a >B-tree. Note, however, that a wide variety of other implementations >are posible (e.g. AVL tree, linked list, hashing, etc.). Analogously, >a computer architecture can be realized using a variety of chips and >board layouts. > >The exploration-tree and the almost-constant-function present clients >with simple interfaces which have clean specifications. That's >surely a necessary feature of a reusable component. >Another important property is that, once you discover them, you find >that they could be used in a wide variety of current applications. >It's not so clear that you can design reusable components which are >only useful in future software. This coincides with my vision of what a reusable component is. The B-tree implementation presented is an attempt to provide a data structure with a simple interface with clean specifications. Granted, the specification may not be as precise as some would like, it does present enough information to make a decision as to whether or not it is adequate for a specific task, and how to use the interfaces. Besides the documentation issues mentioned above, how can this be made to exhibit more of the qualities one associates with reusable software components? -- Paul Sander (408) 734-9822 | "Passwords are like underwear," she said, paul@Atherton.COM | "Both should be changed often." {decwrl,pyramid,sun}!athertn!paul | -- Bennett Falk in "Mom Meets Unix"