Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!samsung!uunet!convex!convex1.convex.com!hamrick From: hamrick@convex1.convex.com (Ed Hamrick) Newsgroups: comp.arch Subject: Re: Killer Micros and vectorized code Summary: Surviving the attack of the killer micros Message-ID: <100598@convex.convex.com> Date: 15 Mar 90 00:24:35 GMT References: <51771@lll-winken.LLNL.GOV> Sender: news@convex.com Organization: Convex Computer Corporation, Seattle, Washington Lines: 148 Mr. Brooks, I've greatly enjoyed the articles you've written regarding the performance of "Killer Micros" relative to larger, more costly machines. Even though there are exceptions to any general rule, I agree with much of what you've been saying, but must disagree with the overall conclusion. The key generalizations that I agree with are: (1) The price/performance ratio of a wide range of applications is better on smaller machines than larger machines. This applies primarily to applications dominated by scalar code that aren't amenable to vectorization or massive parallelism. This is particularly applicable if applications have a locality of reference that can make effective use of high-speed cache. (2) The price per megabyte of disk storage is better for lower-speed and lower-density disk drives. (3) The price per megabyte of memory is better when memory is slower and interleaved less. Many people will argue with all of these generalizations by citing specific counter-examples, but I believe reasonable people would agree that these generalizations have some merit. I also believe that these generalizations have been valid only in the past five years, and that there have been times in the past that the opposite has been true. The conclusion you've reached, and that I must admit I have been tempted to reach myself over the past few years, is that "No one will survive the attack of the killer micros!". As a number of people have pointed out, there are many factors counterbalancing the price/performance advantage of smaller systems. One of the key counter-arguments that a number of people have made is that machines ought to be judged on price per productivity improvement. A faster machine gives people higher productivity because of less time wasted waiting for jobs, and more design cycles that can be performed in a given time. Anything that decreases time-to-market or improves product quality is worth intrinsically more. This is one of the traditional justifications for supercomputers. You noted that a Cray CPU-hour costs significantly more than people earn per hour, but this doesn't take into account that companies can significantly improve their time-to-market and product quality with faster machines, albeit machines that cost more per unit of useful work. This may not matter in some application areas such as computational physics, but a company like Boeing or McDonnell Douglas can lose billions of dollars if they are six months late with getting new products designed. There are also significant cost multipliers involved in producing a better product - for instance a small increase in airplane fuel efficiency can result in significantly larger market share than your competition. Some people have noted that some companies are willing to pay almost anything to get the fastest computers, and this is one of the underlying economic reasons for this willingness. Big companies and government labs tend to use this rationale to justify procuring computers based on single-job performance. However, when you visit these facilities, generally large Cray sites, the machines are generally used as large timesharing facilities. People are finding that machines that were procured to run large jobs in hours are instead running small jobs in days. Further inflaming the problem of having 500 users on a supercomputer is the tendency of these companies and labs to make the use of these machines "free". (Just in passing I'd like to note that the direct result of making CPU time on Crays "free" is that 90% of the CPU cycles get used by 10% of the users, which can hurt time-to-market and reduce productivity. Charging for CPU time causes a vicious feedback loop where fewer users cause higher costs which in turn cause fewer users, etc. The Share Scheduler fixes much of this.) I've felt for some time that there are fundamental reasons that large computer system makers are still surviving, and in the case of CONVEX, growing and prospering. Even though the argument is made that faster machines improve time-to-market, they are almost always used as timesharing systems, often giving no better job turn-around time than workstations. Some companies are surviving because of the immense base of existing applications. Some companies prosper because of good customer service, some by finding vertical market segments to dominate. Every company has unique, non-architectural ways of marketing products that may not have the best price/performance ratio. However, I believe that there are several key strategic reasons that larger, centralized/departmentalized computer systems will in the long run prevail over the killer micros: (1) A single computer user usually consumes CPU cycles irregularly. A user often will have short periods of intense computer activity, followed by long periods of low utilization. I've analyzed almost a years worth of data from a typical engineering computer system (more than 500,000 data samples), and have seen that the number of jobs an individual (or group of individuals) runs at a time approximates a Poisson distribution. This matches what one would expect intuitively - that even heavily loaded systems have some percentage of their CPU cycles that go to the null process. If J is the average number of jobs a person runs at any given time, then EXP(-J) is the percentage of wasted CPU cycles on a single-user system. For instance, if someone is performing a task where they are running 4 jobs at a time on average (sometimes 6, sometimes 2), then the workstation they are using will have EXP(-4) or 2% wasted cycles. Similarly, if there is an average of 1 job at a time, there will be 36% wasted cycles, and 0.25 jobs results in 78% wasted cycles. I would maintain that the average number of runnable jobs on workstations is less than 0.1, resulting in greater than 90% wasted CPU cycles. This statistical character of workloads provides strong economic incentives to people to pool their resources and purchase departmentalized/centralized computer resources. A group of 20 people using a single machine will result in 14% idle CPU time compared with 90% idle CPU time if they use 20 workstations (assuming each user runs an average of 0.1 jobs at a time). This gives a factor of 10 advantage in usable price/performance to the centralized/departmentalized machine. (2) The argument for the centralization/departmentalization of disk resources closely parallels the argument for CPU resources. If each user is given dedicated disks on workstations, then significant amounts of total disk space and total disk bandwidth goes to waste. There is significant economic incentive to centralizing/departmentalizing disk storage for this reason, as well as other reasons relating to data security and data archiving. (3) I would maintain that the amount of memory needed by a job is roughly proportional to the amount of CPU time needed to run the job. This is a very imprecise correlation, but is true to some degree across a wide range of problems. I would also maintain that if an N-Megabyte program takes M seconds to run in N megabytes of physical memory, then it will take approximately 6*M seconds to run in N/2 megabytes of physical memory. This factor of 6 performance degradation holds true for a wide range of large memory application programs. This gives a strong economic incentive to users to centralize/departmentalize their memory, and run large memory jobs in series. For instance, assume two workstation users each have 64 MBytes of memory and need to run 128 MByte jobs. Assume these jobs take 12 hours apiece when run in 64 MBytes. If the two workstation users put all 128 MBytes of memory on one workstation, and junked the second workstation, they could get both jobs done in 4 hours (2 hours per job) by running the two jobs in series on the large-memory workstation. There is an additional economic incentive to centralizing memory that comes from the statistical nature of memory utilization by a group of users. Using similar arguments to (1) above, you can easily show that a computing architecture with centralized/departmentalized high-speed memory is much more cost effective than distributing memory across multiple workstations. Obviously, there is much more involved in selecting the optimal computing architecture for a given workload. Just as I disagree with you that simple measures of price/performance will predict the success or demise of a product, many people would probably maintain that my arguments about centralizing compute/disk/memory resources are also simplistic. There are many counter arguments favoring distributed computing solutions, and many more arguments favoring centralization. The main point I wanted to make in this note is that simple price/performance measures are poor predictors of the long-term viability of a company's products. I'm sure that most readers of this newsgroup could post a long list of companies that had/have excellent price/performance but that are/will be out of business. Regards, Ed Hamrick (hamrick@convex.com) Area Systems Engineer CONVEX Computer Corporation