Newsgroups: comp.sys.amiga.programmer
Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!think.com!mintaka!geech.gnu.ai.mit.edu!rjc
From: rjc@geech.gnu.ai.mit.edu (Ray Cromwell)
Subject: Re:     Lemmings - a tutorial Part V (last)
Message-ID: <1991Apr2.002631.22799@mintaka.lcs.mit.edu>
Sender: daemon@mintaka.lcs.mit.edu (Lucifer Maleficius)
Organization: The Internet
References: <mykes.0926@amiga0.SF-Bay.ORG> <1991Apr1.020748.26863@mintaka.lcs.mit.edu> <mykes.1028@amiga0.SF-Bay.ORG>
Date: Tue, 2 Apr 91 00:26:31 GMT
Lines: 136

In article <mykes.1028@amiga0.SF-Bay.ORG> mykes@amiga0.SF-Bay.ORG (Mike Schwartz) writes:
>In article <1991Apr1.020748.26863@mintaka.lcs.mit.edu> rjc@geech.gnu.ai.mit.edu (Ray Cromwell) writes:
>
>>  Obviously you have no idea of how advanced today's optimizing compilers
>>are.  The code you stepped through must have been produced by some
>>1970's MetaComco compiler or something. But FYI, most of todays compilers
>>can pass arguements in registers, allocate memory without stack, eliminate
>>the frame registers, and even do all the non-obvious tricks of sign extension,
>>etc. Check out the code I've included at the end compiled by GCC.
>>
>>The following was compiled with GCC -O -fstrength-reduce -fomit-frame-register
>>I don't have SAS C on the Amiga, but I'm sure it produces simular results.
>>
>>/* test.c */
>>
>>char buf[20];
>>main()
>>{
>>  char *d=(char *)&buf;
>>  const char *s="This is a test\n";
>>  while(*s) { *d++=*s++; }
>>}
>>
>>/* Test.s produced by gcc */
>>
>>#NO_APP
>>gcc_compiled.:
>>.text
>>LC0:
>>	.ascii "This is a test\12\0"
>>	.even
>>.globl _main
>>_main:
>>	lea _buf,a1
>>	lea LC0,a0
>>	tstb a0@
>>	jeq L5
>>L4:
>>	moveb a0@+,a1@+
>>	tstb a0@
>>	jne L4
>>L5:
>>	rts
>>.comm _buf,20
>>
>
>/* Test.s produced by gcc */
>
>#NO_APP
>gcc_compiled.:
>.text
>LC0:
>	.ascii "This is a test\12\0"
>	.even
>.globl _main
>_main:					; (cycles)
>	lea _buf,a1			; 8
>	lea LC0,a0			; 8
>	tstb a0@			; 8
>	jeq L5				; 8
>L4:
>	moveb a0@+,a1@+			; 14*12
>	tstb a0@			; 14*8
>	jne L4				; 13*10+1*8
>L5:
>	rts				; 16
>.comm _buf,20
>
>
>;/* test.s produced by me */		; (cycles)
>	lea	text(pc),a0		; 8
>	lea	buf(pc),a1		; 8
BUG! This is not the same algorithm in the C code. Your code would move
a NULL byte if a NULL string was passed when it should do nothing. I don't know
why GCC doesn't generate PC relative instructions, but I think it has to do
with UNIX and perhaps the scatter load/memory partitioning.

>.loop	move.b	(a0)+,(a1)+		; 14*12
>	bne.s	.loop			; 13*10+1*8
>	rts				; 16
>text	dc.b	'This is a test',10,0
>buf	ds.b	20
>
>NO COMPARISON DUDE!  GCC makes 3 totally wasted instructions, and one of
>them is inside your loop.  Try an example with nested loops and your
>wasted clock cycles become a geometric progression.  Multiply the kind
>of inefficiencies that GCC demonstrates here by EVERY loop and EVERY
>function you have and your program is slower and bigger than it needs
>to be.  To be specific, the GCC routine is 6 words longer and by the
>time it is done executing, it will take 128 more clock cycles than
>mine will (on a 68000).  Your routine takes 466 total clocks to execute,
>mine takes 338.  I'm just your average 68000 assembler language programmer,
>but I saved 28% CPU time.  You might also note that your 'C' source is
>7 lines of code and so is my assembler code.

  This code is compiled to run on a 68030 @ 50mhz, the difference in speed
would be _very_ small. Hey, someone compile this on SAS/C with ALL
optimizations on. You're just your average assembly language programmer
yet you introduced a subtle bug into the program that will stomp on the
first byte of a static global string by attempting to copy a null
string. In a HUGE program this bug would be so subtle that it may
take days to find out how a memory location is being sporatically
trashed. Worse yet, a null string will have garbage following it that could
span hundreds of bytes. 

 Your assembler code has the C equivelent of 'do { *d++=*s++; } while(*s);'

>Just think, the OS ROM routines are written in 'C' and compiled with
>a lesser compiler than gcc (5 years ago).
  Just think, OS2.0 is compiled with SAS/C.

>
>
>>  Also, it's not the language but the algorithm that is responsible for
>>how fast a routine runs. Compare a C coded Boyer-Moore string search
>>with an Assembly coded brute-force byte by byte search, the C code would
>>probably win without optimizations turned on.
>>
>
>You should compare apples to apples.  Compare your Boyer-Moore string search

>in assembler language with the one in 'C' on the Amiga.  How about comparing
>an assembler coded Boyer-Moore string search against a 'C' coded brute-force
>byte by byte search?  You'd complain it's not fair either.
   The point is, OPTIMIZATIONS only account for a marginal linear speed
improvement while choosing the right algorithm could result in an order
or two of magnitude improvement. Matt's assembler is perfect proof. If
Matt coded it in assembly and added some features it would blow away DevPac
and probably beat ArgAsm.


--
/~\_______________________________________________________________________/~\
|n|   rjc@albert.ai.mit.edu   Amiga, the computer for the creative mind.  |n|
|~|                                .-. .-.                                |~|
|_|________________________________| |_| |________________________________|_|