Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!usc!orion.oac.uci.edu!cedman
From: cedman@golem.ps.uci.edu (Carl Edman)
Newsgroups: comp.sys.amiga.tech
Subject: Re: C compilers code generation
Message-ID: <CEDMAN.90Nov14221934@lynx.ps.uci.edu>
Date: 15 Nov 90 06:19:29 GMT
References: <1990Nov12.135444.10739@cs.utwente.nl>
	<1990Nov12.164804.5490@agate.berkeley.edu>
	<26893.273fe96d@kuhub.cc.ukans.edu>
	<dillon.7256@overload.Berkeley.CA.US>
Organization: University of California, Irvine, USA.
Lines: 47
Nntp-Posting-Host: lynx.ps.uci.edu
In-reply-to: dillon@overload.Berkeley.CA.US's message of 15 Nov 90 02:08:22 GMT

In article <dillon.7256@overload.Berkeley.CA.US> dillon@overload.Berkeley.CA.US (Matthew Dillon) writes:

   In article <26893.273fe96d@kuhub.cc.ukans.edu> markv@kuhub.cc.ukans.edu writes:
   >Dont forget about SAS/Lattice's support for __builtin functions like
   >memcpy, memset, etc that use inline code rather than function calls.
   >(By flipping the compiler switch for processor you can also get such
   >loops to use DBxx loops for 68010 and 32 bit instructions for 68020).

       Well, actually, while the built-in stuff is cute it is also pretty
       useless in most cases.  For example, the code for a 'full' version of
       setmem()/memset(), movmem()/memmov(), etc.... is pretty big, but also
       can be a hell of a lot faster (using MOVEM's or at least long ops
       instead of char ops).  I think the only real builtin function that
       is useful is, maybe, strlen().  This applies to all processors since
       a DBxx loop using a BYTE transfer size is still a BYTE transfer loop,
       even if all the instructions are cached.

       The DBxx loops are nothing more than a simple optimization in my book,
       though one that DICE does not currently do.

       Frankly, I just do not see any advantage and it can be *really*
       confusing.

That e.g. memmove() functions which are really optimal are quite large
might be true. But most of that complexity results from an analysis
of the parameters and choosing the corresponding algorithm to deal
optimally with these parameters (e.g. overlapping/non-overlapping memory
areas, odd/word-even/long-word addresses/lengths, downward/upward copy,
large arrays/small arrays a.s.o.). Each combination of these parameters
requires a different routine to be optimal. So the code which analyses
the parameters and the different codes for different parameter sets
make up most of the code. But now imagine a C compiler which does
the parameter analysis (as far as possible) at run time and only
inserts the 'correct' routine for these parameter sets.

I think you will have to admit that in this case you could have significant
speedups and space savings.

        Carl Edman


Theorectical Physicist,N.:A physicist whose  | Send mail
existence is postulated, to make the numbers |  to
balance but who is never actually observed   | cedman@golem.ps.uci.edu
in the laboratory.                           | edmanc@uciph0.ps.uci.edu