Path: utzoo!mnetor!uunet!husc6!bloom-beacon!gatech!hubcap!ncrcae!ncr-sd!hp-sdd!hplabs!hpcea!hpfcdc!hpfcmp!hpfcse!hpuecoa!bgphp1!rclark From: rclark@bgphp1.UUCP (Roger N. Clark) Newsgroups: comp.sys.hp Subject: HP825 math 15x SLOWER than 825 Message-ID: <830004@bgphp1.UUCP> Date: 25 Mar 88 16:41:55 GMT Organization: U.S. Geological Survey, Branch of Geophysics, Denver Lines: 332 I have benchmarked the HP9000 series 825 using number crunching programs and find: The 825 is 5 to 7 times SLOWER than a single cpu 500!!!!! In a multitasking environment the 825 can be at least 15 TIMES SLOWER ^^^^^^^^^^^^^^^ than a 3 cpu 500!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! The details: In February a note was posted to comp.sys.hp that the HP9000 series 500 was being discontinued. That caused quite a flurry of responses, including several that said the new HP9000 series 825 is much faster. I have heard several stories about how 3 9000 s500's were replaced with one 825 and everyone was happy. HP is saying the 825 is very fast. Well, on February 19, I posted a rather strong note about the 500 being discontinued. The 500 is no longer being suported in that there will be no more software releases (that is especially disturbing considering that HP-UX 5.21 apparently has many problems). I need features that are not on the 500 (network file system or at least TCP/IP, domain-based mail). HP has said I need to upgrade to an 825 (or higher). Before changing machines, I benchmark it with programs similar to what my group does. Here at the USGS, we do analysis of spectra of rocks and minerals and apply the results to imaging data (remote sensing). I am on 3 NASA planetary spacrcraft teams and the methods will be applied to gigabytes of data in the 1990's. The analyses includes some very sophisticated (and number crunching intensive) modeling programs. The programs are not huge (less than 2 MBytes on a 6.5 MByte system) and we do not have a paging problem. Below are the results of a simple "wierd box filter" program. This program shows a typical response in our shop. It does both array indexing and computation on elements in the arrays. The compiled program is only about 350KBytes in size and it does not page to disk. A Multitasking, CPU intensive Benchmark Real Time ----------------------------------------------------------------------- Number of Tasks System 1 2 3 4 5 7 10 12 ----------------------------------------------------------------------- HP9000/500 3 CPUs 5.9 6.0 6.3 8.4 10.5 14.7 21.5 27.8 HP900/825 HPUX1.2 29.1 58.1 87.2 116.3 145.6 205.0 291.5 350.1 ----------------------------------------------------------------------- CPU Time ----------------------------------------------------------------------- Number of Tasks System 1 2 3 4 5 7 10 12 ----------------------------------------------------------------------- HP9000/500 3 CPUs 5.8 11.8 18.4 24.4 30.7 43.0 62.2 81.4 HP900/825 HPUX1.2 29.0 57.9 87.0 116.0 144.7 202.7 288.5 346.5 ----------------------------------------------------------------------- NOTES: HP9000/500: 6.5 MBytes main memory, 3 floating point CPUs, 65MByte system, 55MByte /tmp disk, 132MByte user disk, 571MByte data disk (Used by virtual memory), HP-UX 5.21. HP9000 series 825 (HP Precision Architecture, RISC machine) 16 MBytes of main memory, single 404MByte disk drive. HP Demo, 3/23/88. HP-UX 1.2 (Also tried it on HP-UX 2.0 pre-release with slightly worse results). ----------------------------------------------------------------------- I have several other benchmarks. On number crunching programs that do not have array indexing (just do +, -, *, /, logs, sin, cos, sqrt, powers) the results came out (normalized to s500): single cpu program 825 500 --------------------------------------------------------------- in C 7.6 1 (825 7.6 times SLOWER) single precision Fortran 3.23 1 (825 3.23 times SLOWER) double precision Fortran 6.7 1 (825 6.7 times SLOWER) WHAT DOES ALL THIS MEAN? HP advertises the 825 as a 0.5 megaflop machine. My results show it as about a 0.03 megaflop machine. The benchmarks were done several times wiith different machine configurations at the Neeley sales region (Hal Shearer, hpuecoa!hals). HP has benn very helpful but has not been able to figure out why these results are so bad. HP has a new 835 that is substantially faster. This benchmark has been run at Fort Collins but I haven't gotten the results yet. I have heard that they are faster than a 3 cpu 500 however. A LESSON EVERYONE SHOULD KNOW: BENCHMARK YOUR APPLICATION BEFORE YOU BUY A MACHINE. ^^^^^^^^^^^^^^^^ Is the 825 really that bad? Could there be a problem with the 825 I tested. The sieve benchmark came out 12 times faster than a single cpu 500 and all my I/O benchmarks came out very fast. I think the 825 has a real problem with number crunching. I then looked at alternatives to the 825. I tried a 350 but I currently have about 8 to 10 users on every day. We have 29 RS232 ports, 6 HP-IB cards (4 disks, 2 plotters, 1 9-track tape, 2 cartridge tapes), 2 printers, 3 modems and 3 spectrometers connected to the 500. (The benchmarks were also done on the 500 WHILE a program was locked in memory gathering data from a spectrometer real time!). The 350 does not have enough slots to put all this stuff in it. CONCLUSIONS: The HP9000 series 500 is a DAMN GOOD machine. HP doesn't seem to know how good it is! I gues because they failed to market it those who bought it now have to suffer. HOW GOOD IS IT? As I write this note, we have been up 129 days. We have never had a operating system crash! In 4 years, we have only gone down for adding new boards, occassional disk image backups, or power failures. We have been up for as long as 6 months! We have 8 to 10 users on every day, and gather data from 3 different spectrometers while users are doing compute modeling and interactive analysis with graphics (on HP2623A and HP2393A terminals). The machine is currently the central node in our Branch uucp network and a nationwide uucp network of spectroscopy groups. During power failures, we have never lost data except once: our air conditioner caught fire and I pulled the plug! (we only lost one small text file, and we had many active users on at the time). The process ID rolls over (32000 or processes) every day or two. We have had only two hardware problems: the main power supply went out shortly ofter installation and an 8-channel mux went bad about a year ago. I HAVE NEVER SEEN SUCH A SOLID MACHINE! Contrast the above to our VAXes and PEs: they have to reboot every few days to a couple of weeks or so, and have hadware problems about every month (of course they are getting old and are older technology). ************************************************************************ * * * BRING BACK THE 500 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! * * * ************************************************************************ Below is the "wierd box filter" benchmark. Try it yourself. I would be interested in what you find. Roger N. Clark Research Scientist U.S. Geological Survey, MS 964 Box 25046 Federal Center Denver, CO 80225-0046 (303) 236-1332 FTS 776-1332 {known-world}!hplabs!hpfcla!hpfcse!hpuecoa!bgphp1!rclark #---------------------------------- cut here ---------------------------------- # This is a shell archive. Remove anything before this line, # then unpack it by saving it in a file and typing "sh file". # # Wrapped by rclark at bgphp1 on Fri Mar 25 08:07:26 1988 # # This archive contains: # makefile speedtest.f multi.sh timeit # # Error checking via wc(1) will be performed. # Error checking via sum(1) will be performed. echo x - makefile cat >makefile <<'@EOF' CFLAGS= FFLAGS= LFLAGS= RFLAGS= -6% -C GET= get GFLAGS= a.out: speedtest.f f77 $(FFLAGS) speedtest.f @EOF set -- `sum speedtest.f <<'@EOF' C array addressing and number crunching implicit integer*4 (i-n) common array1(200,200), array2(200,200), z(9) limit = 200 ktimes = 1 C initialize arrays do 10 j= 1, 9 z(j) = float(j)+2.0 10 continue x = 1.0 do 30 j = 1, limit do 20 i = 1, limit x = x + 1.0 array1(i,j) = x 20 continue 30 continue do 200 k = 1, ktimes C main computation loop: Weird Box Filter do 100 j = 2, limit-1 do 50 i = 2, limit-1 array2(i,j) = 1 ( array1(i-1,j-1)*2.0*z(1) 1 +array1(i ,j-1)*2.0/z(2) 1 +array1(i+1,j-1)*2.0*z(3) 1 +array1(i-1,j )*2.0/z(4) 1 +array1(i ,j )*2.0*z(5) 1 +array1(i+1,j )*2.0/z(6) 1 +array1(i-1,j+1)*2.0*z(7) 1 +array1(i ,j+1)*2.0/z(8) 1 +array1(i+1,j+1)*2.0*z(9)) 1 /(9.0*(z(1)-z(2)+z(3)- 1 z(4)+z(5)-z(6)+z(7)- 1 z(8)+z(9))) 50 continue 100 continue C main computation loop complete 200 continue stop end @EOF set -- `sum multi.sh <<'@EOF' for i do a.out & done wait @EOF set -- `sum timeit <<'@EOF' set -x echo "********** weird box filter *********" /bin/time /bin/sh multi.sh 1 /bin/time /bin/sh multi.sh 1 /bin/time /bin/sh multi.sh 1 /bin/time /bin/sh multi.sh 1 2 /bin/time /bin/sh multi.sh 1 2 /bin/time /bin/sh multi.sh 1 2 /bin/time /bin/sh multi.sh 1 2 3 /bin/time /bin/sh multi.sh 1 2 3 /bin/time /bin/sh multi.sh 1 2 3 /bin/time /bin/sh multi.sh 1 2 3 4 /bin/time /bin/sh multi.sh 1 2 3 4 /bin/time /bin/sh multi.sh 1 2 3 4 /bin/time /bin/sh multi.sh 1 2 3 4 5 /bin/time /bin/sh multi.sh 1 2 3 4 5 /bin/time /bin/sh multi.sh 1 2 3 4 5 /bin/time /bin/sh multi.sh 1 2 3 4 5 6 7 /bin/time /bin/sh multi.sh 1 2 3 4 5 6 7 /bin/time /bin/sh multi.sh 1 2 3 4 5 6 7 /bin/time /bin/sh multi.sh 1 2 3 4 5 6 7 8 9 10 /bin/time /bin/sh multi.sh 1 2 3 4 5 6 7 8 9 10 /bin/time /bin/sh multi.sh 1 2 3 4 5 6 7 8 9 10 echo "************ DONE weird box filter benchmark ************" @EOF set -- `sum