Path: utzoo!utgpu!water!watmath!clyde!ima!think!barmar From: barmar@think.COM (Barry Margolin) Newsgroups: comp.lang.misc Subject: Re: Poor Algorithms Message-ID: <17658@think.UUCP> Date: 8 Mar 88 17:57:29 GMT References: <3821@ihlpf.ATT.COM> <2791@enea.se> <404@tub.UUCP> Sender: usenet@think.UUCP Reply-To: barmar@fafnir.think.com.UUCP (Barry Margolin) Organization: Thinking Machines Corporation, Cambridge, MA Lines: 73 In article <404@tub.UUCP> net@tub.UUCP (Oliver Laumann) writes: >All the anecdotes about poor algorithms remind me of an amusing tale >I have read in the (highly recommendable, by the way) paper "Hints for >Computer System Design" by Butler W. Lampson of Xerox Palo Alto. > > The remarkable >point is that this procedure ran in time O(n^2), where n is the length >of the document! > >This result was achieved by first writing a procedure "FindIthField" >which finds the i-th field in the document. This, of course, takes >time O(n). The procedure "FindNamedField(name)" was then implemented >in the following "natural" way [quoted directly from the article]: > > FOR i = 0 to NumberOfFields DO > FindIthField; IF its name is "name" THEN EXIT > ENDLOOP Whether they got the algorithm right or not depends on which algorithm you are talking about. There is a very similar routine in the Symbolics Genera user interface, used when you want to retrieve a line that matches a given string from your input history; the input history is a linked list, and a very similar loop is used, so it is also N^2. Such a loop is used because of information hiding; the loop is in the INTERACTIVE-STREAM flavor, while the linked list is hidden in the HISTORY flavor, and the interface only provides a routine to get the Nth history element. So, is this organization flawed? I don't think so. However, the implementation of the ELEMENTn operation could be improved (in fact, the reason I even know about this is that someone here patched the routine years ago in the manner I'm about to describe) to make such a common operation less expensive. The flaw in Oliver's argument above is that FindIthField need not always be O(I). In particular, if it is common to request field n+ soon after requesting field n, it would pay to remember where field n ended, and start from there instead of from the beginning. This is a time/space trade-off that really pays back, because an O(N^2) time algorithm is reduced to O(N) at the cost of O(1) storage. So, this is really, in my opinion, an example of the way that abstraction and information hiding make it easier to improve performance. In order to fix this performance bug, all that is needed is two instance variables (assuming an object-oriented implementation) and a small change to one method. In the Symbolics case, an alternate fix would be to use an array rather than a linked list. However, since in this case it is more common to add elements to one end of the history (every time the user types a line) than to search far back in the history, and in the upcoming release they will allow selective deletion from the middle, it should be obvious why the linked list representation was chosen. But I think they made the right choice in not exposing the implementation choice in the interface; this allows them to try many different implementations (others I can think of is are a linked list of medium-sized arrays and or an array of medium-sized arrays -- the latter is O(1) for both adding a new element and finding the Nth element). I concede that there are cases where the redesign must affect many levels of the implementation. This often means that the original design was severely flawed. I think a definition I recently saw in the Object-Oriented Design discussion was that such a design can be considered good if most specification changes result in modifications to only one or two classes. Applying this to the above example, the addition of the specification "find the result in linear time" results in a simple change to one class. Barry Margolin Thinking Machines Corp. barmar@think.com uunet!think!barmar