Path: utzoo!attcan!utgpu!news-server.csri.toronto.edu!rutgers!uwm.edu!ux1.cso.uiuc.edu!ux1.cso.uiuc.edu!aglew From: aglew@crhc.uiuc.edu (Andy Glew) Newsgroups: comp.arch Subject: Re: int x int -> long for * (or is it 32x32->64) Message-ID: Date: 16 Sep 90 16:52:15 GMT References: <3984@bingvaxu.cc.binghamton.edu> <41425@mips.mips.COM> <4025@bingvaxu.cc.binghamton.edu> <69436@sgi.sgi.com> Sender: news@ux1.cso.uiuc.edu (News) Organization: Center for Reliable and High-Performance Computing University of Illinois at Urbana Champaign Lines: 40 In-Reply-To: vjs@rhyolite.wpd.sgi.com's message of 15 Sep 90 05:39:46 GMT In article <4025@bingvaxu.cc.binghamton.edu>, kym@bingvaxu.cc.binghamton.edu (R. Kym Horsell) writes: > I'm afraid the hard-hearted answer has got to be along the lines of: > "What proportion of time is spent in the checksum code, & by what factor > is it increased by not having the convient carry"? On a 68030 based system I found that 15% of the entire CPU was being spent in in_chksum() (my memory may be failing wrt. exact name), on real user workloads (namely, the backbone machines on our local net). This was for the naive byte-at-a-time one's complement sum (again, memory may be failing me. I believe it was a one's complement sum, but there have been quite a variety of checksums). Unrolling the loop and computing the checksum 32 bits at a time instead of 8 bits at a time gave me approximately a 6-8-fold speedup. A bit of instrumentation showed that the overwhelming majority of packets were of only two sizes, and these were special cased. With the new code, in_chksum() was reduced to around 4% of the CPU. (Not linearly divided by speedup because of overhead, and traffic increase). I used the carry-out and carry-in to do this. Coding without the carry would approximately double number of instructions for this checksum, but many of the added instructions would be branches. Actually, I'd probably only do it 16 bits at a time, no branches, which would be, again, a 3-fold slowdown (shifts and masks in the loop). Ie. based on my experience coding in_chksum(), but not having coded it on a MIPS, I would estimate that the slowdown through not having carry out and in is approximately 3-fold wrt. good code that uses carry-out and in. But this is only an upper bound, because overhead of call, etc., gets in the way. I do not wish to pass judgement on the usefulness of carries; I only wished to provide a data point for "by what factor is it [the checksum] increased by not having the convenient carry". -- Andy Glew, a-glew@uiuc.edu [get ph nameserver from uxc.cso.uiuc.edu:net/qi]