Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!convex!convex1.convex.com!hamrick
From: hamrick@convex1.convex.com (Ed Hamrick)
Newsgroups: comp.arch
Subject: Re: Killer Micros and vectorized code
Message-ID: <100701@convex.convex.com>
Date: 20 Mar 90 04:30:16 GMT
References: <51771@lll-winken.LLNL.GOV> <100598@convex.convex.com> <52661@lll-winken.LLNL.GOV>
Sender: usenet@convex.com
Organization: Convex Computer Corporation, Seattle, WA
Lines: 79

Mr. Brooks,

I read your recent article regarding killer micros with great interest.
I'd like to comment on a few of the points you made below:

In article <52661@lll-winken.LLNL.GOV> brooks@maddog.llnl.gov (Eugene Brooks)
> Computers are best utilized as shared resources, your Killer Micros should
> be many to a box and sitting in the computer room where the fan noise does
> not drive you nuts.  This is where I keep MY Killer Micros.

I received a lot of mail regarding this very point, and you were one of the
few people who agreed with me.  I'd like to qualify this point by saying that
too much centralization is inefficient also.  A good rule of thumb is to
centralize to the point where 50% to 80% of the compute cycles are used.
Sharing at the departmental level also alleviates many of the problems of
corporate-wide centralization.

A much more interesting subject is the one you raise below:

In article <52661@lll-winken.LLNL.GOV> brooks@maddog.llnl.gov (Eugene Brooks)
> To use the "efficient utilization argument" to support the notion that
> low volume custom processor architectures might possibly survive the
> attach of the Killer Micros is pretty foolish, however.  Ed, would you
> care to run the network simulator and Monte Carlo code I posted results
> of on the Convex C210, and post the results to this group?  I won't
> ruin the surprise by telling you how it is going to come out...

I'd be happy to run these programs on a C210.  I think you'd find that
the C210 does much better than the 25 MHz clock would otherwise lead
you to predict.  However, most of CONVEX's customers purchase our
machines for more than the excellent scalar performance - a large
number of important scientific and engineering applications require
high speed vector performance along with large memory, 2 GByte virtual
address space, and high-speed I/O.

It would be interesting to see the performance of these scalar codes
on various architectures, relative to the clock speed of the machines
implementing these architectures, especially the Cray numbers.

The cost of processors is a very small part of the total cost of a
departmental compute server.  How much do you think Alliant pays for
the 8 i860 chips in their low-end $500K product?  The design of the
memory system is the dominant factor in system performance and system
cost for departmental supercomputers.

There is no question that all computer vendors will some day implement their
particular architectures in a small number of chips.  The only question
is when.  Making this decision too early might cause you to make premature
architectural trade-offs in order to reduce the number of gates needed
for today's chips.  For example, the i860 uses reciprocal approximation
for the divide and square root functions.  If space for more gates had
been available, the i860 might have been implemented differently.

> Perhaps we can get the fellows at Alliant to do the same with their new
> 28 processor Killer Micro powered machine.  That i860 is definitely a
> Killer Micro. After we compare single CPU performances, perhaps we could
> then run the MIMD parallel versions on the Convex C240 and the Alliant 28
> processor Killer Micro powered box.  Yes, there are MIMD parallel versions
> of both codes which could probably be made to run on both machines.

If you have a chance, ask the Alliant people what their Linpack 100x100
performance is, and see how well it scales up to 28 processors.  Try to
get real runs, not estimates.  I'd also be curious about main memory
bandwidth (not crossbar bandwidth).  Information like number of banks,
number of bytes read per bank access, and bank cycle time would be
particularly interesting.  It would also be useful to run the MIMD versions
of your codes on both the Alliant and the C240, and compare the parallel
speed-ups.  It would also be revealing to run MIMD scalar codes (and vector
codes) that have a low cache hit rate on both the Alliant and CONVEX.

As an aside, I was curious why you were asked not to release information
about the low utilization of the 300 workstations you mentioned.  I can't think
of any reason Livermore wouldn't want this information publicly available,
since this is likely to be true of any organization using large numbers of
single user workstations.  It would do a great service to people considering
lots of single-user killer micros to have this data publicly available.

Regards,
Ed Hamrick