Xref: utzoo comp.unix.aix:511 comp.misc:8180 Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!lavaca.uh.edu!nntppost From: jet@karazm.math.uh.edu (J. Eric Townsend) Newsgroups: comp.unix.aix,comp.misc Subject: System/6000 statistics cf. other machines Message-ID: <1990Feb13.235806.6278@lavaca.uh.edu> Date: 13 Feb 90 23:58:06 GMT Sender: nntppost@lavaca.uh.edu (NNTP Posting Service) Organization: University of Houston -- Department of Mathematics Lines: 56 Went to an interesting meeting this afternoon. It wasn't non-disclosure, so I guess I can talk about some of the stuff that was mentioned. :-) First off, raw #'s: the SPECratio scale had to be extended to 80... :-) 100x100 matrix multiply, in Fortran. Figures are in MFLOPS ETA-10P 75.6 System/6000 aka rios 18.2 * Titan 6.7 SparcStation-1 1.4 DECstation 3100 0.8 (*) if they coded the problem by hand, they got around 40 MFLOPS From my notes: Can do 1 floating point, one int and one branch instruction in one cycle. No mode bit. 32b instructions. "varlen VLIW". no explicit pipe. no delay brach op (conventional branching). no quash. basically sequential machine, relies on compiler for max performance. single precision slightly slower than double. :-) interger P and float P have internal queueing. Inst Cache handles relative addresses and PC Loop closing branch -- special loop countdown register that is decremented *while* branch is going. therefore, can close loop in effective 0 time (while executing next instruction ?) line sizes are 128, Inst cache 4K, data 128K (?) no pre-fetch (try to fix in compiler for now) memory mapped I/O penalty for cache miss: 2-4 cycles (speaker couldn't remember exactly) (some mention of being worried about stride n misses by somebody) PLA compiler ported from 370. 1 FP operation per cycle, including new multiply & add -> register. FP is 151 bit precise for last instr. -- J. Eric Townsend University of Houston Dept. of Mathematics (713) 749-2120 jet@karazm.math.uh.edu Skate UNIX(tm).