Xref: utzoo comp.arch:8675 comp.sys.intel:742 Path: utzoo!yunexus!torsqnt!dptcdc!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!rutgers!apple!amdcad!sun!pitstop!sundc!seismo!uunet!auspex!guy From: guy@auspex.UUCP (Guy Harris) Newsgroups: comp.arch,comp.sys.intel Subject: Re: i860 overview (long) Message-ID: <1133@auspex.UUCP> Date: 8 Mar 89 07:44:15 GMT Article-I.D.: auspex.1133 References: <807@microsoft.UUCP> <92634@sun.uucp> <13322@steinmetz.ge.com> Reply-To: guy@auspex.UUCP (Guy Harris) Organization: Auspex Systems, Santa Clara Lines: 62 >One problem with any chip which requires alligned data is that >performance suffers when addressing bytes, to the point that a program >may become impractical. I don't think that's true. My handy-dandy Cypress CY7C600 Family Users Guide, for the Cypress SPARC implementation, says that LDSB (LoaD Signed Byte), LDSH (LoaD Signed Halfword - 16 bits), LDUB (LoaD Unsigned Byte), LDUH (obvious), and LD (LoaD word - 32 bits), all take 2 cycles. My handy-dandy MIPS R2000 RISC Architecture manual, alas, has no timings such as that - after all, it's an *architecture* manual, not a manual for some particular *implementation* - but I'd be *very* surprised if byte load/store operations were so much slower that "a program (such as 'troff') may become impractical". (My expectation is that they're no slower, just as on SPARC.) Are you, perhaps, thinking of word-addressible machines, and under the impression that not only do RISC machines tend to require, say, 4-byte alignment of 4-byte quantities, but that they can't deal with quantities shorter than 4 bytes? That's simply not true of the RISC machines with which I'm familiar. BTW, there exist CISC machines that require alignment, as well; as I remember, all but the most recent AT&T WE32K chips require it. >One of the people here checked his Sun-30 (68020) against his Sun-4 >(SPARC). The three ran troff about 5x faster. The only three explanations I can imagine for that, offhand, are: 1) he's got the two figures backwards; the Sun-4 was ~5x faster than the Sun-3; 2) the figures are real time, not CPU time, and something else is interfering; 3) "troff" is floating-point intensive, and the Sun-4 in question has no FPU (e.g., a 4/110 with no FPU). Explanation 3) falls by the wayside rather quickly; I grepped for "float" and "double" throughout the code and didn't find it. This leaves 1) or 2); is there one I missed? I tried comparing "troff"s on a Sun-3/50 with 4MB memory, and a Sun-4/260 with 32MB memory, both running 4.0. Here are the times: Sun-4/260: auspex% time troff -t -man /usr/man/man1/csh.1 >/dev/null 24.4u 1.2s 0:34 75% 0+456k 26+38io 31pf+0w auspex% time troff -t -man /usr/man/man1/csh.1 > /dev/null 24.4u 1.5s 0:36 71% 0+464k 1+35io 0pf+0w Sun-3/50: bootme% time troff -t -man /usr/man/man1/csh.1 >/dev/null 118.9u 1.2s 2:08 93% 0+208k 14+33io 24pf+0w bootme% time troff -t -man /usr/man/man1/csh.1 > /dev/null 120.2u 2.8s 2:31 81% 0+192k 5+32io 11pf+0w The 4/260 did 5x *better* than the 3/50, not 5x *worse*, on that example! Could 1) be the correct explanation?