Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watnot!watmath!clyde!rutgers!lll-lcc!lll-crg!hoptoad!gnu From: gnu@hoptoad.UUCP Newsgroups: comp.arch,comp.lang.c Subject: Tuning your libraries for your machine Message-ID: <1959@hoptoad.uucp> Date: Sun, 5-Apr-87 05:46:39 EST Article-I.D.: hoptoad.1959 Posted: Sun Apr 5 05:46:39 1987 Date-Received: Sun, 5-Apr-87 21:46:33 EST References: <15292@amdcad.UUCP> <978@ames.UUCP> <15694@sun.uucp> <1530@husc6.UUCP> <1537@husc6.UUCP> Organization: Nebula Consultants in San Francisco Lines: 46 Xref: utgpu comp.arch:779 comp.lang.c:1473 In article <1537@husc6.UUCP>, reiter@endor.harvard.edu (Ehud Reiter) writes: > 2) Simple routines like strcpy should be adjusted to perform well on a > particular architecture (if the microVAX doesn't have a hardware locc > instruction, then is it too much to ask that the run-time library supplied > for the microVAX be changed not to use locc, at least in small and frequently > used routines like strcpy?) It only becomes reasonable to tailor a system for a particular piece of hardware when there are only a small number of variants that run that architecture. In other words, this might have been fine when there was the 780 and the 750 (nobody counted the 730 or MV-1 anyway) but once you have a bunch of models, you just have to make the code straightforward and don't do anything that *really* breaks on some machine. I presume in the Vax case this means mostly avoiding the unimplemented instructions. I worked on an APL system for the IBM 360/370 and just finding out the timings for the 15 or 20 models that could run the code was too much work, let alone figuring out which combination would be best until IBM's next release. (No flames on 15..20, this was in 1973!) (Of course, the same applies to an "architecture" like C/Unix -- write code that's straightforward and doesn't do anything that really breaks anywhere. Super optimizing your C source is kinda hard these days -- are you *sure* it's better to code it this way on the Cray? IBM? DG? DEC? 8080?) It's true that a tailored shared library could give some benefit, but the general problem extends to what code to generate inline, not just in library routines. > 3) Simple routines like strcpy should be recoded in assembler, at least to > the degree of having their procedure prologues simplified, and so that they > use registers which don't have to be restored. > 4) In-line expansion of common (and simple) library routines should be > considered. These should both be done automatically by a good compiler. Compilers that put in large procedure prolog/epilogs and don't simplify them when possible have no excuses. Those that won't use the scratch registers for variables when possible have excuses but newer compilers are beating them -- excuses don't benchmark very well. -- Copyright 1987 John Gilmore; you can redistribute only if your recipients can. (This is an effort to bend Stargate to work with Usenet, not against it.) {sun,ptsfa,lll-crg,ihnp4,ucbvax}!hoptoad!gnu gnu@ingres.berkeley.edu