Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84; site petrus.UUCP Path: utzoo!watmath!clyde!burl!ulysses!allegra!bellcore!petrus!hammond From: hammond@petrus.UUCP Newsgroups: net.unix-wizards Subject: Re: Celerity evaluation Message-ID: <328@petrus.UUCP> Date: Tue, 23-Apr-85 07:42:56 EST Article-I.D.: petrus.328 Posted: Tue Apr 23 07:42:56 1985 Date-Received: Wed, 24-Apr-85 03:53:36 EST References: <10095@brl-tgr.ARPA> Organization: Bell Communications Research, Inc Lines: 46 I have done a fair amount of simple benchmarks on a Celerity C1200, Pyramid 90x, Vax 780, and Vax 785, to compare performance of the CPUs. The machines all had optional floating point accelerators, the Pyramid also had a data cache option. The basic results: For double precision floating point in C (using register double variables, which the 4.2 BSD and Pyramid appear to equate to double variables), I can confirm that the Celerity C1200 appears to be 2 times an 11/780 w/FPA. That makes it the fastest floating point of the 4 types tested. I also, at least on the trivial integer benchmarks we tested, can say that the basic CPU for integer aritmetic appears to be about 3 times an 11/780 or roughly the same as a Pyramid 90x. Disk Performance: Although my trivial benchmarks took almost the same amount of CPU (using their new, faster cc) as the Pyramid, they took 3 times as long in real time. Our Pyramid has eagles, the Celerity had the slower 120Mb disks. I don't know what improvement an eagle would make. Flies in the ointment: The Celerity is a Fortran machine, it has a stack register array (I'd call it a cache, but caches in my view empty/fill automagically and this doesn't) of 16 levels. If your code makes procedure calls which nest to a depth of greater than 16, then the OS has to copy the registers to main memory. This is VERY expensive in CPU time. Our test of Ackerman's function died after CPU times of 6.3 user, 107.5 sys (to do all those copies of the stack registers). It died because of a second flaw: the stack can only grow to a depth of 128K (about 1024 calls deep) by default. You can (at compile time) tell the system to allocate more stack space. I have not yet received an explanation of why they did this behaviour change from standard BSD, if there is a good reason, we could probably live with it, since few (other than Ackermann's) procedures get all that deep. However, the stack register array filling/unfilling is a more immediate concern, since it is quite expensive in CPU resources and it does happen. We noted that the C compiler rolled up fair amounts of system time (several times a Pyramid 90x), probably for stack growth. Another problem we noted was that the system calls we tried measuring ( some of those common to Sys V and 4.2 BSD) were on the average 20% slower than an 11/780, despite having a (by our tests) 3 times faster CPU. We are still trying to find out what was going on. My suspicion is the loading/unloading of the stack register set for context saves. If Celerity fixes the stack growth to be less painful, it is a very interesting machine for number crunching. Rich Hammond {allegra | decvax | ucbvax} bellcore!hammond