Path: utzoo!mnetor!uunet!lll-winken!lll-lcc!lll-tis!ames!pasteur!ucbvax!smu.UUCP!leff From: leff@smu.UUCP (Laurence Leff) Newsgroups: comp.doc.techreports Subject: wisconsin.v Message-ID: <8803170538.AA02740@smu.edu> Date: 17 Mar 88 05:38:59 GMT Sender: daemon@ucbvax.BERKELEY.EDU Organization: The Internet Lines: 489 Approved: techreports@smu.csnet Bibliography of Technical Reports Computer Science Department University of Wisconsin September 1987 - December 1987 For copies, send requests to: Technical Report Librarian Computer Science Department University of Wisconsin 1210 W. Dayton Street Madison, WI 53706 %A Vasant Honavar %A Leonard Uhr %T Recognition Cones: A Neuronal Architecture for Perception and Learning %D September 1987 %R TR 717 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X There is currently a great deal of interest and activity in developing connectionist, neuronal, brain-like models, in both Artificial Intelligence and Cognitive Science. This paper specifies the main features of such systems, and examines "recognition cone" models of perception from this perspective, as examples of structures of neuron-like units combined into successively larger brain-like modules. Issues addressed include architecture, information flow, and the parallel-distributed nature of processing and control in recognition cones; and their use in perception and learning. %A Deepankar Medhi %T Decomposition of Structured Large-scale Optimization Problems and Parallel Optimization %D September 1987 %R TR 718 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X In this dissertation, we present serial and parallel algorithms to solve efficiently the problem: inf {@sum from i=1 to N~f sub i ( x sub i )~ |~ sum from i=1 to N functions (not necessarily differentiable) taking values in the extended real line @(\- \(if, ~\(if@]. For example, block-angular linear programming problems and linear multi-commodity network optimization problems can be cast into the above form. In our approach, we take the Rockafellar dual of the problem to arrive at an unconstrained nonsmooth maximization problem. The difficulty arises from the nonsmoothness of the dual objective. Traditional subgradient methods are not good enough as they do not have implementable stopping criterion and are reported to have slow convergence. One also may not obtain a primal optimal solution at the end. Instead, we apply a modified bundle algorithm, which has an implementable stopping criterion, and more importantly, one can recover an approximate primal optimal solution. We also obtain some theoretical \fIa posteriori\fR error information on the approximate solution. We have implemented this algorithm on randomly generated block- angular linear programming problems of size up to 4,000 equality constraints and 10,000 variables. Our implementation ran up to seventy times faster than MINOS version 5.0, and did substantially better than an advanced implementation of the Dantzig-Wolfe decomposition method. Thus we think that for this type of problem, our algorithm is very promising. A nice feature of the dual problem is that it breaks up the original problem into smaller \fIindependent\fR subproblems. Exploiting this fact, we present parallel algorithms implemented on the CRYSTAL multicomputer. We considered two groups of test problems for these algorithms, one in which the subproblems required approximately equal amounts of time to solve, and another in which the solution times varied. In the first group, we obtained 70% -80% efficiency with up to eleven processors. In the second group, we obtained 60% or more efficiency with relatively small problems and with up to five processors. %A Kenneth Kunen %T Signed Data Dependencies in Logic Programs %D October 1987 %R TR 719 %I COMPUTER SCIENCES DEPAERTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X Logic programming with negation has been given a declarative semantics by Clark's Completed Database, CDB, and one can consider the consequences of the CDB in either 2-valued or 3-valued logic. Logic programming also has a proof theory given by SLDNF derivations. Assuming the data dependency condition of \fIstrictness\fR, we prove that the 2-valued and 3-valued semantics are equivalent. Assuming \fIallowedness\fR (a condition on occurrences of variables), we prove that SLDNF is complete for the 3-valued semantics. Putting these two results together, we have completeness of SLDNF deductions for strict and allowed databases and queries under the standard 2-valued semantics. This improves a theorem of Casedon and Lloyd, who obtained the same result under the additional assumption of \fIstratifiability\fR. %A Carl de Boor %A Klaus Hollig %A Sherman Riemenschneider %T Fundamental Solutions for Multivariate Difference Equations and Applications to Cardinal Interpolation %D October 1987 %R TR 720 %I COMPUTER SCIENCE DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X Let @b~:~ZZ sup d~ \(->~ IR@ be a mesh-function with compact support. It is shown that the difference equation @sum from j\(moZZ sup d ~b(k~-~j)a (j)~=~f(k),~~~k~\(mo~ZZ sup d@, has a bounded solution \fIa\fR if@|f(j)|~=~O((1+ |j|) sup -n@) for some exponent \fIn\fR which depends on \fIb\fR. This result is the discrete analogue of the existence of tempered fundamental solutions for partial differential operators with constant coefficients. It is applied to prove optimal convergence rates for interpolation with box-splines. %A Seymour V. Parter %T Remarks on the Solution of Toeplitz Systems of Equations %D October 1987 %R TR 721 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X In this survey we recall some of the theory of Toeplitz matrices which is relevant for the questions which arise in the inversion of Toeplitz systems of equations. In the course of our presentation we discuss some examples and raise some particular questions. %A Wei-Chung Hsu %T Register Allocation and Code Scheduling for Load/Store Architectures %D October 1987 %R TR 722 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X to achieve high performance, the structure of on-chip memory in a single-chip computer must be appropriate, and it must be allocated effectively to minimize off-chip communication. Since the off-chip memory bandwidth of single-chip computers is severely limited, data caches that exploit spatial locality to achieve high hit rates are not appropriate. A register file, which can be managed by compilers, might be more effective than a data cache as an on-chip memory structure. With a load/store architecture, compilers can separate operand fetches from their use by scheduling code, thus achieving high hit rates without increasing memory traffic. Register allocation also exploits temporal locality better than a data cache does. This thesis investigates how effective register allocation could be and studies the interdependency problem between register allocation and code scheduling. A model of perfect register allocation is explored. An algorithm for optimal local register allocation is then developed. Since the optimal algorithm needs exponential time in the worst case, a heuristic algorithm which has near-optimal performance is proposed and compared with other existing heuristic algorithms. Through trace simulation, the perfect register allocation model is shown to be much more effective in reducing off-chip memory traffic than cache memory of the same size. Code scheduling interferes with register allocation, especially for large basic blocks. Two methods are proposed to solve this interference: (1) an integrated code scheduling technique; and (2) a DAG-driven register allocator. The integrated code scheduling method combines two scheduling techniques--one to reduce pipeline delays and the other to minimize register usage--into a single phase. By keeping track of the number of available registers, the scheduler can choose the appropriate scheduling technique to schedule a better code sequence. The DAG-driven register allocator uses the Dependency DAG to assist in assigning registers; it introduces much less extra dependency than does an ordinary register allocator. Both approaches are shown to generate more efficient code sequences than conventional techniques in the simulations. %A Yannis E. Ioannidis %A Joanna Chen %A Mark A. Friedman %A Manolis M. Tsangaris %T Bermuda - An Architectural Perspective on Interfacing Prolog to a Database Machine %D October 1987 %R TR 723 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X We describe the design and implementation of BERMUDA, which is a system interfacing Prolog to the Britton-Lee Intelligent Database Machine (IDM). We discuss several architectural issues faced by such systems, and we present the solutions adopted in BERMUDA. In BERMUDA, rules are stored in Prolog, and facts are primarily stored in a database. BERMUDA has been designed and implemented so that multiple concurrent Prolog processes, possibly running on different machines, can share a database. Moreover, the semantics of Prolog programs remain unchanged and the use of a database system is transparent to the user. Finally, BERMUDA has achieved a certain level of portability by using the given Prolog interpreter and database system (almost) unchanged. BERMUDA also employs several novel techniques to make the interface of Prolog to the database efficient. %A Goetz Graefe %T Rule-Based Query Optimization in Extensible Database Systems %D November 1987 %R TR 724 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X This thesis presents the problems of query optimization in extensible database systems and proposes a solution. It describes the design and an initial evaluation of the query optimizer generator developed for the EXODUS extensible database system. The goal of the EXODUS system is to provide software tools and libraries to structure and to ease the task of implementing or extending a database system for a new data model. Our basic model of optimization is to map a query tree, which consists of operators at the nodes and stored data at the leaves, to an access plan, which is a tree with implementation methods at the nodes and scans at the leaves. The optimizer generator translates algebraic equivalence rules into an executable optimizer. The equivalence rules are specific to the data model. The generated optimizer reorders query trees and selects implementation methods according to cost functions associated with the methods. The search strategy of the optimizer avoids exhaustive search by learning from past experience. We report on two operational optimizers. Experiments with a restricted relational system show that the generated optimizer produces access plans of almost the same anticipated execution cost as those produced by exhaustive search, with the search time cut to a small fraction. Another set of experiments shows that a generated optimizer is able to handle large queries. An optimizer currently under development for a new query evaluation method shows the power and flexibility of the approach. Other researchers have decided to use the optimizer generator for their database implementation work. Independently from the EXODUS project, the optimizer generator proved to be a valuable tool for exploring the trade-offs between left-deep execution trees and general execution trees in relational database systems. Our experiments show that for bushy trees, the higher optimization cost and the cost for creating and reading temporary files can be more than compensated by the reduction in processing cost. %A Victor Shoup %T Finding Witnesses Using Fewer Random Bits %D November 1987 %R TR 725 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X Let @G@ be a proper subgroup of @ZZ* sub n@, the multiplicative group of units modulo @n@. Many number theoretic algorithms assume that an element in @ZZ* sub n ~-~ G@ can easily be found. In this context, an element in @ZZ* sub n ~-~ G@ is often called a "witness." Ankeny's theorem states that, assuming the ERH, the smallest witness is @O(log sup 2 ~n)@. The purpose of this paper is to examine a "randomized" Ankeny's conjecture. Consider the following experiment. Choose @a ~ \(mo ~ ZZ sub n@ at random. Examine the elements @a,~ a+1,~.~.~.~,~a + k ~-~ 1@, where @k ~=~ O(log sup c n)@ for some constant @c@. We would like the probability that none of these are witnesses to be small. The randomized Ankeny conjecture is that this probability is @O(1/n sup \(*a@) for some constant 0 < @\(*a@ < 1. We show that if the randomized Ankeny conjecture is true, then a deterministic Ankeny conjecture, which allows us to efficiently find witnesses deterministically, is already true. We also prove some partial randomized Ankeny results, which state that we can bound the probability of not finding a witness by @O(1/p sup{1/2 - \(mo}@), where @p@ is a prime divisor of @n@ that is "nontrivial" on @G@. %A Charles V. Stewart %A Charles R. Dyer %T Local Constraint Integration in a Connectionist Model of Stereo Vision %D November 1987 %R TR 726 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X The stereo matching algorithms proposed in the research literature have been fairly successful, but none has completely solved the problem. We present a number of steps toward improving matching by: (1) analyzing and reformulating a number of important matching constraints, including uniqueness, coarse-to-fine multiresolution, fine-to-coarse multiresolution, detailed match, figural continuity and the disparity gradient. (2) Building an algorithm that integrates the influence of these constraints @cooperatively@ and @in parallel@ using the General Support Principle. This principle states that except for uniqueness, @only~positive@ constraint influences are employed. (3) Implementing this General Support Algorithm using a connectionist network. Such a network allows the constraints to be integrated naturally in a parallel, relaxation computation. The resulting connectionist network has been simulated and tested. It generally produces a high percentage (95-99%) of correct matching decisions. This testing has enabled us to understand the types of problems that @locally@ defined constraints can not accommodate, and leads to ideas for non-local mechanisms to address these problems. %A Gurindar Sohi %T The Effects of Faults in Cache Memories %D November 1987 %R TR 727 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X As processor speeds increase, on-chip caches to provide adequate memory bandwidth are becoming increasingly important. Such caches are prone to faults both during manufacturing and during normal processor operation because of the large density of active components. Since the CPU's interactions with the memory dictate the performance of the processor, it is important to evaluate the effect of faults in the cache memory system. Faults in components such as registers, busses, control logic, etc., are critical faults because the processor will cease to operate correctly unless some action is taken to tolerate such faults. Cache memory, on the other hand, is not a critical component of the processor - it is present mainly for performance reasons. The processor will be able to operate in a correct but degraded fashion if parts of the cache memory are faulty and adequate means are provided to recover correct data through bypassing faulty cache components. Traditional techniques for tolerating faults in memory systems such as Single Error Correction and Double Error Detection (SECDED) codes may not be appropriate for a cache since they increase the latency of the cache. If the cache memory system does not have the ability to correct errors, two questions arise: (i) how does one make sure that correct data can be recovered at all times and (ii) how do faults in the cache affect performance. In this paper, we discuss the performance of cache memories under fault conditions. Through the use of simulation, we study how the performance of a cache is degraded under fault conditions. %A Hung-Yang Chang %T Dynamic Scheduling Algorithms for Distributed Soft Real-time Systems %D November 1987 %R TR 728 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X This dissertation is a study of dynamic scheduling algorithms for distributed soft real-time systems with non-periodic workloads. Soft real-time systems permit deadline misses with the performance goal of minimizing the \fIdeadline miss ratio\fR, the percentage of jobs missing their deadlines, and meeting the \fIglobal priority\fR requirement. In a distributed system with non-periodic workloads, dynamic assignment of tasks to processors can significantly improve system performance. However, job transfers and negotiations between processors exact a cost from the system, limiting performance improvement. Therefore, these algorithms must meet the real-time goals while minimizing the scheduling overhead. In order to minimize the global deadline miss ratio, an algorithm must employ a negotiation protocol to coordinate schedulers. We devise a \fItriage\fR protocol that attempts to transfer a job when the job cannot meet its deadline locally but can meet it at a remote processor. Based on this protocol, three initiation approaches are compared. The receiver-initiated approach is found to be most robust, whereas the sender-initiated approach may cause thrashing when the system load is high. Global priority scheduling attempts to ensure that the N jobs being served at any moment in the N-processor system are those having the highest priorities. If a job that is waiting has higher priority than the one that is running at a remote processor, the waiting job should be transferred to receive immediate service. The tradeoffs between reducing overhead and relaxing the global priority requirement are explored by comparing four algorithms, each exhibiting a different degree of compromise. For jobs consisting of a group of cooperating tasks, a global priority scheduling algorithm should take job structure into account. We devise algorithms that give compatible priority service to tasks belonging to a single job, so that cooperating tasks may progress at the same rate, avoiding excessive blocking due to synchronization. Under the assumption that there are two priority levels, our algorithms ensure that tasks are assigned to processors with the lowest priority loads. In total, we present six placement algorithms, evaluated under a workload consisting of jobs having a pipeline structure. %A David L. Cohrs %A Barton P. Miller %A Lisa A. Call %T Distributed Upcalls: A Mechanism for Layering Asynchronous Abstractions %D November 1987 %R TR 729 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X It is common to use servers to provide access to facilities in a distributed system and to use remote procedure call semantics to access these servers. Procedure calls provide a synchronous interface to call downward through successive layers of abstraction, and remote procedure calls allow the layers to reside in different address spaces. But servers often need the ability to initiate asynchronous and independent actions. Examples of this asynchrony are when a network server needs to signal to an upper layer in a protocol, or when a window manager server needs to respond to user input. Upcalls are a facility that allow a lower level of abstraction to pass information to a higher level of abstraction in a clean way. We describe a facility for distributed upcalls, which allows upcalls to cross address space boundaries. Remote procedure calls (for handling asynchronous server requests) and distributed upcalls (for handling asynchronous server activities), complement each other to form a powerful structuring tool. These facilities, together with the ability to dynamically load modules into a server, allow the user to arbitrarily place abstractions in the server or in the client. Distributed Upcalls have been built into a server structuring system called CLAM, which is currently being used to support an extensible window manager system. The CLAM system, including distributed upcalls, remote procedure call extensions to C++, dynamic loading, and basic window classes, is currently running under 4.3BSD UNIX on Microvax workstations. %A Michael J. Litzkow %A Miron Livny %A Matt W. Mutka %T Condor - A Hunter of Idle Workstations %D December 1987 %R TR 730 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X This paper presents the design, implementation, and performance of the Condor scheduling system. Condor operates a workstation environment. The system aims to maximize the utilization of workstations with as little interference as possible between the jobs it schedules and the activities of the people who own workstations. It identifies idle workstations and schedules background jobs on them. When the owner of a workstation resumes activity at a station, Condor checkpoints the remote job running on the station and transfers it to another workstation. The system guarantees that the job will eventually complete, and that very little, if any, work will be performed more than once. The system has been operational for more than three months. In this paper we present a performance profile of the system based on data that was accumulated from 23 stations during one month. During the one-month period, nearly 1000 jobs were scheduled by Condor. The system was used by heavy users and light users who consumed approximately 200 CPU days. An analysis of the response times observed by the different users is a clear display of the capacity. Since a user of Condor has to devote some local capacity to support the remote execution of his/her jobs, the effectiveness of the remote scheduling system depends on the amount of this capacity. We show that this overhead is very small. On the average, a user has to sacrifice less than one minute of local CPU capacity to acquire a day of remote CPU capacity. Condor has proven to be an extremely effective means to improve the productivity of our computing environment. %A Rong-Jaye Chen %T Parallel Algorithms For A Class of Convex Optimization Problems %D December 1987 %R TR 731 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X This thesis is principally concerned with a piecewise-linear trust region method for solving a class of structured convex optimization problems, which includes the traffic assignment problems. Piecewise-linear approximation of nonlinear convex objective functions in linearly constrained optimization produces subproblems that may be solved as linear programs. This approach to approximation may be used for nonseparable as well as separable functions, and for the former class (the focus of this thesis), it lies between linear and quadratic approximation with regard to its accuracy. In order to have additional control of the accuracy of the piecewise-linear approximation, we consider two devices: rectangular trust regions and dynamic scaling. The use of rectangular trust regions in conjunction with the type of piecewise-linear approximation considered here actually serves to simplify rather than complicate the approximating problems. This is a result of the equivalence of the trust region and the use of a limited number of segments of comparable size in the approximation. The approach to dynamic scaling considered here may be applied to problems in which each objective function term is a convex function of a linear function of the variables. This scaling device allows the algorithm to adjust the approximation between an underestimating function (corresponding to a linear approximation) and an overestimating function (the non-separable analog of the overestimate associated with separable approximation of convex functions.) The scaling factor is adjusted in accordance with the acceptance criteria associated with the trust region method. Another emphasis of this thesis is the development of parallel algorithms suited to distributed computing and the comparison of the relative efficiencies of these algorithms on different architectures. Computational experience is cited for some large-scale problems arising from traffic assignment applications. The algorithms considered here also have the property that they allow such problems to be decomposed into a set of smaller optimization problems at each major iteration. These smaller problems correspond to linear single-commodity networks, and may be solved in parallel. Results are given for the distributed solution of these problems on the CRYSTAL multicomputer. %A R. J. Chen %A R. R. Meyer %T Parallel Optimization For Traffic Assignment %D December 1987 %R TR 732 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X Most large-scale optimization problems exhibit structures that allow the possibility of attack via algorithms that exhibit a high level of parallelism. The emphasis of this paper is the development of parallel optimization algorithms for a class of convex, block-structured problems. Computational experience is cited for some large-scale problems arising from traffic assignment applications. The algorithms considered here have the property that they allow such problems to be decomposed into a set of smaller optimization problems at each major iteration. These smaller problems correspond to linear single-commodity networks in the traffic assignment case, and they may be solved in parallel. Results are given for the distributed solution of such problems on the CRYSTAL multicomputer. %A M. Muralikrishna %A David J. DeWitt %T Equi-Depth Multi-Dimensional Histograms %D December 1987 %R TR 733 %I COMPUTER SCIENCES DEPARTMENT, UNIVERSITY OF WISCONSIN %C MADISON, WI %X Multi-dimensional queries commonly occur in databases dealing with geographical, image, and VLSI databases. A typical two dimensional query in a geographical database might involve finding all cities within certain latitudinal and longitudinal bounds. Several multi-dimensional index structures have been proposed in the literature. KDB trees [Robinson81], R-trees [Guttman84], and Grid files [Nievergelt84] are among the more popular ones. However, there has been no work in designing multi-dimensional histograms to aid in the optimization process using these multi-dimensional index structures. In order for an optimizer to select an appropriate access path for a multi-dimensional query, fairly accurate selectivity estimates must be available to it. Selectivity estimates are also useful in determining appropriate join methods that follow the selections. In this paper we present an algorithm for generating equi-depth, multi- dimensional histograms. One might expect that the cost of building a d-dimensional histogram would be at least d times the cost of sorting the relation on a single attribute. We show, in our algorithm, that the sorting cost of building a d-dimensional histogram is significantly less than the cost of sorting the relation d times. We present a main memory data structure for storing the histograms and discuss two schemes for estimating the number of tuples that will be retrieved by a given query. Experimental results are presented that show the efficacy of our histograms. The usefulness of a sampling technique in generating histograms at a very low cost is also explored.