Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!usc!orion.oac.uci.edu!cedman From: cedman@golem.ps.uci.edu (Carl Edman) Newsgroups: comp.sys.amiga.tech Subject: Re: C compilers code generation Message-ID: Date: 15 Nov 90 06:19:29 GMT References: <1990Nov12.135444.10739@cs.utwente.nl> <1990Nov12.164804.5490@agate.berkeley.edu> <26893.273fe96d@kuhub.cc.ukans.edu> Organization: University of California, Irvine, USA. Lines: 47 Nntp-Posting-Host: lynx.ps.uci.edu In-reply-to: dillon@overload.Berkeley.CA.US's message of 15 Nov 90 02:08:22 GMT In article dillon@overload.Berkeley.CA.US (Matthew Dillon) writes: In article <26893.273fe96d@kuhub.cc.ukans.edu> markv@kuhub.cc.ukans.edu writes: >Dont forget about SAS/Lattice's support for __builtin functions like >memcpy, memset, etc that use inline code rather than function calls. >(By flipping the compiler switch for processor you can also get such >loops to use DBxx loops for 68010 and 32 bit instructions for 68020). Well, actually, while the built-in stuff is cute it is also pretty useless in most cases. For example, the code for a 'full' version of setmem()/memset(), movmem()/memmov(), etc.... is pretty big, but also can be a hell of a lot faster (using MOVEM's or at least long ops instead of char ops). I think the only real builtin function that is useful is, maybe, strlen(). This applies to all processors since a DBxx loop using a BYTE transfer size is still a BYTE transfer loop, even if all the instructions are cached. The DBxx loops are nothing more than a simple optimization in my book, though one that DICE does not currently do. Frankly, I just do not see any advantage and it can be *really* confusing. That e.g. memmove() functions which are really optimal are quite large might be true. But most of that complexity results from an analysis of the parameters and choosing the corresponding algorithm to deal optimally with these parameters (e.g. overlapping/non-overlapping memory areas, odd/word-even/long-word addresses/lengths, downward/upward copy, large arrays/small arrays a.s.o.). Each combination of these parameters requires a different routine to be optimal. So the code which analyses the parameters and the different codes for different parameter sets make up most of the code. But now imagine a C compiler which does the parameter analysis (as far as possible) at run time and only inserts the 'correct' routine for these parameter sets. I think you will have to admit that in this case you could have significant speedups and space savings. Carl Edman Theorectical Physicist,N.:A physicist whose | Send mail existence is postulated, to make the numbers | to balance but who is never actually observed | cedman@golem.ps.uci.edu in the laboratory. | edmanc@uciph0.ps.uci.edu