Path: utzoo!attcan!uunet!decwrl!sgi!vjs@rhyolite.wpd.sgi.com From: vjs@rhyolite.wpd.sgi.com (Vernon Schryver) Newsgroups: comp.arch Subject: Re: int x int -> long for * (or is it 32x32->64) Keywords: arithmetic,arbitrary precision,benchmark,modular arithmetic Message-ID: <69436@sgi.sgi.com> Date: 15 Sep 90 05:39:46 GMT References: <3984@bingvaxu.cc.binghamton.edu> <41425@mips.mips.COM> <4025@bingvaxu.cc.binghamton.edu> Sender: guest@sgi.sgi.com Organization: Silicon Graphics, Inc., Mountain View, CA Lines: 41 In article <4025@bingvaxu.cc.binghamton.edu>, kym@bingvaxu.cc.binghamton.edu (R. Kym Horsell) writes: >> In article <69434@sgi.sgi.com> I wrote about a carry bit > > I'm afraid the hard-hearted answer has got to be along the lines of: > "What proportion of time is spent in the checksum code, & by what factor > is it increased by not having the convient carry"? Details of the family of implementations I know best would probably be considered proprietary by those who paid for it. However, one can estimate, based on the often repeated fact that to a first approximation, TCP input runs as fast as you can fondle the bytes: a) any fetches and stores to get from the media to a system buffer (0 if you have media-to-host DMA) b) 1 fetch to compute the checksum c) 1 fetch and 1 store to copy to the user process buffer (unless you cheat--all's fair as long as the data gets where it is needed) A simple implementation with direct DMA and no cheating needs 2 fetches and 1 store, The total cost of the TCP checksum in such a system is around 30%. Output is similar. Cache effects are important but ignored here. The trickier you are elsewhere, the large the checksum looms. Current, fast workstations move about 1MByte/sec TCP/IP user-process-to- user-process over ethernet. Ignoring important details, one saved instruction/byte is a million saved instructions/sec. Simplistically, if you could "fetch, add word to accumulator, add-carry to accumulator", you could save 0.5 instructions/byte on a MIPS CPU. Of course an ADDC instruction would cost in many other places, dragging in all of the disadvantages of status bits. I understand that the 6MByte/sec (?) figure from Cray benefits from vector hardware to do the checksums. The TCP checksum by itself does not justify ALU status bits in general purpose CPU's in high performance workstations, because of various bits of specialized hardware in various vendors' implementations for various media. What do the extend precision experts say about carry bits? Vernon Schryver vjs@sgi.com