Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84; site brl-tgr.ARPA Path: utzoo!watmath!clyde!burl!ulysses!mhuxr!mhuxt!houxm!whuxl!whuxlm!harpo!decvax!genrad!panda!talcott!harvard!seismo!brl-tgr!tgr!BillW@SU-SCORE.ARPA From: BillW@SU-SCORE.ARPA (William Chops Westfield) Newsgroups: net.micro Subject: Re: Standard, What standard??? (size vs. speed) Message-ID: <9614@brl-tgr.ARPA> Date: Sat, 30-Mar-85 21:13:50 EST Article-I.D.: brl-tgr.9614 Posted: Sat Mar 30 21:13:50 1985 Date-Received: Tue, 2-Apr-85 05:59:57 EST Sender: news@brl-tgr.ARPA Lines: 70 Al Filipski says... Funny, I've always thought that size and speed were INVERSELY related. Take sort algorithms, f'rinstance. One of the smallest sorts you can write is one of the worst-- the bubble.... From an abstract point of view, he is correct. However, we are not really talking about small algorithms vs large algorithms here. Rather, we are talking about the implementation of a given algorithm. In this latter case, it is true that a small implementation of an algorithm is is faster than a large implementation. You could call implementation a sort of sub-algorithm. Consider the following two pieces of code (part of a TCP checksum routine for a 8088 processor. Assume we want a 1's complement 16 bit checksum of CX words pointed to by BX:) --------------- Example 1: lpchk: add ax,(bx) | add value and carry 2 bytes, 18 cycles adc ax,*0 | 3 bytes, 8 c inc bx | bump pointer 1 byte, 2 c inc bx | 1 byte, 2 c loop lpchk | do it again 2 bytes, 17 c Example #2 mov si,bx lpchk: lodw |get next word 1 byte, 16 cycles adc bx,ax |overlap adding last CY 2 bytes, 3 c loop lpchk |next word 2 bytes, 17 c mov ax,bx | put where results have to go adc ax,*0 | add final carry bit. --------------- Although both example use essentailly the same algorithm (there aren't a whole lot of ways to calculate such a checksum!), the second example is smaller and faster - It makes better use of the 8088 architecture. Which brings up another point: When trying to optimize code, I like to divide optimization into three different levels: Algorithmic optimization, architectural optimization, and system level optimization. Algorithmic optimization is what is most frequently talked about. (for exampe going from bubble to quick sort). Unfortunately, timings in this area are rather abstract. Sorts tend to be rated by the number of data comparisons that have to be done, for example. Which is usually a pretty good idea, but what if a compare is very fast compared to instruction fetches (for example)? Which brings us to: Architectural optimization is where you take a look at your processor, and try to make the fast parts of your code use fast instructions, and such. This is usually where us assembly language hackers gain speed over HLL programmers. In the examples given above, I might try to ensure that the words being checksummed inside the loop were being fetched froma word boundry if it were running on an 8086 proceessor, since this would require half the bus cycles, for example... Systems Level optimization is when you take into account nasty things like the operating system and such. For example, using the key as an address may sound like a very fast way to sort things, but under most operating systems, you are likely to spend most of the time you might have saved down somewhere in the pager, moving pieces of your address space on and off of disk, meanwhile letting other users run as your console time climbs above that which a simple bubble sort might have taken (run-time vs console time optimizations are a very interesting instance of SLO problems...) Sorry to have been so long winded... Bill Westfield