Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!convex!convex1.convex.com!hamrick From: hamrick@convex1.convex.com (Ed Hamrick) Newsgroups: comp.arch Subject: Re: Killer Micros and vectorized code Message-ID: <100701@convex.convex.com> Date: 20 Mar 90 04:30:16 GMT References: <51771@lll-winken.LLNL.GOV> <100598@convex.convex.com> <52661@lll-winken.LLNL.GOV> Sender: usenet@convex.com Organization: Convex Computer Corporation, Seattle, WA Lines: 79 Mr. Brooks, I read your recent article regarding killer micros with great interest. I'd like to comment on a few of the points you made below: In article <52661@lll-winken.LLNL.GOV> brooks@maddog.llnl.gov (Eugene Brooks) > Computers are best utilized as shared resources, your Killer Micros should > be many to a box and sitting in the computer room where the fan noise does > not drive you nuts. This is where I keep MY Killer Micros. I received a lot of mail regarding this very point, and you were one of the few people who agreed with me. I'd like to qualify this point by saying that too much centralization is inefficient also. A good rule of thumb is to centralize to the point where 50% to 80% of the compute cycles are used. Sharing at the departmental level also alleviates many of the problems of corporate-wide centralization. A much more interesting subject is the one you raise below: In article <52661@lll-winken.LLNL.GOV> brooks@maddog.llnl.gov (Eugene Brooks) > To use the "efficient utilization argument" to support the notion that > low volume custom processor architectures might possibly survive the > attach of the Killer Micros is pretty foolish, however. Ed, would you > care to run the network simulator and Monte Carlo code I posted results > of on the Convex C210, and post the results to this group? I won't > ruin the surprise by telling you how it is going to come out... I'd be happy to run these programs on a C210. I think you'd find that the C210 does much better than the 25 MHz clock would otherwise lead you to predict. However, most of CONVEX's customers purchase our machines for more than the excellent scalar performance - a large number of important scientific and engineering applications require high speed vector performance along with large memory, 2 GByte virtual address space, and high-speed I/O. It would be interesting to see the performance of these scalar codes on various architectures, relative to the clock speed of the machines implementing these architectures, especially the Cray numbers. The cost of processors is a very small part of the total cost of a departmental compute server. How much do you think Alliant pays for the 8 i860 chips in their low-end $500K product? The design of the memory system is the dominant factor in system performance and system cost for departmental supercomputers. There is no question that all computer vendors will some day implement their particular architectures in a small number of chips. The only question is when. Making this decision too early might cause you to make premature architectural trade-offs in order to reduce the number of gates needed for today's chips. For example, the i860 uses reciprocal approximation for the divide and square root functions. If space for more gates had been available, the i860 might have been implemented differently. > Perhaps we can get the fellows at Alliant to do the same with their new > 28 processor Killer Micro powered machine. That i860 is definitely a > Killer Micro. After we compare single CPU performances, perhaps we could > then run the MIMD parallel versions on the Convex C240 and the Alliant 28 > processor Killer Micro powered box. Yes, there are MIMD parallel versions > of both codes which could probably be made to run on both machines. If you have a chance, ask the Alliant people what their Linpack 100x100 performance is, and see how well it scales up to 28 processors. Try to get real runs, not estimates. I'd also be curious about main memory bandwidth (not crossbar bandwidth). Information like number of banks, number of bytes read per bank access, and bank cycle time would be particularly interesting. It would also be useful to run the MIMD versions of your codes on both the Alliant and the C240, and compare the parallel speed-ups. It would also be revealing to run MIMD scalar codes (and vector codes) that have a low cache hit rate on both the Alliant and CONVEX. As an aside, I was curious why you were asked not to release information about the low utilization of the 300 workstations you mentioned. I can't think of any reason Livermore wouldn't want this information publicly available, since this is likely to be true of any organization using large numbers of single user workstations. It would do a great service to people considering lots of single-user killer micros to have this data publicly available. Regards, Ed Hamrick