Xref: utzoo comp.lang.c:27215 comp.lang.misc:4651 Path: utzoo!utgpu!news-server.csri.toronto.edu!clyde.concordia.ca!uunet!mcsun!ukc!dcl-cs!aber-cs!rupert!pcg From: pcg@rupert.cs.aber.ac.uk (Piercarlo Grandi) Newsgroups: comp.lang.c,comp.lang.misc Subject: Re: function calls Message-ID: Date: 25 Mar 90 23:00:51 GMT References: <29551@amdcad.AMD.COM> <14281@lambda.UUCP> Sender: pcg@aber-cs.UUCP Organization: Coleg Prifysgol Cymru Lines: 67 In-reply-to: jlg@lambda.UUCP's message of 20 Mar 90 23:07:40 GMT Posting-Front-End: GNU Emacs 18.47.1 of Wed Mar 15 1989 on rupert (berkeley-unix) In article <14281@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes: From article <29551@amdcad.AMD.COM>, by tim@nucleus.amd.com (Tim Olson): > [...] > It might be true that scientific routines written in FORTRAN may have > this many live, non-overlapping variables to keep in registers, but I > don't believe this is true in general. Statistics from a large > collection of programs and library routines (a mix of general and > scientific applications written in C) show that of 782 functions (620 > of which were non-leaf functions), an average of 6.5 registers per > function were live across function calls. This statistic can only be interpreted in one way: the C compiler in question didn't allocate registers very well. Especially in scientific packages, there are _HUGE_ numbers of 'live' _VALUES_ to deal with during execution of even simple routines. Vectors, arrays, lists, strings, etc, are alle being either produced or consumed. This is an old fallacy: the number of useful registers is usually quite low; the Wall paper and others say that for most codes, even floating point intensive ones, 4-8-16 registers make do. The problem that Giles does not seem to consider is that caching values in registers is only useful if the values are going to be used repeatedly, like all forms of caching. It is not difficult to produce examples of fairly common pieces of code where on many machine register caching worsens performance. Many registers are useful when: 1) Your so called 'optimizer' does not select values to cache on expected dynamic frequency of use but on static frequency of use. Since the two are poorly correlated, your so called 'optimizer' wants to cache everything in sight. 2) You have extremely high latency to memory, and you want to use a large register cache as a large cache, where even infrequently reused values are insufferably expensive to refetch. 3) You have extremely high latency to memory, and you can prefetch blocks of operands while other blocks of operands are being processed, because you know which operands are going to be needed next, like with vector machines. 4) You have multiple functionals units, and each of them then can make use of a set of registers. Note that all these do not really mean that you need lots of registers; 1) means that your compiler is stupid, 2) that you are missing a proper dynamic cache, and 3) and 4) that you have actually multiple threads of control. My aversion to large register caches and so called clever optimizer should be well known now, and stems from my opinion that stupid compilers are to be avoided, very optimizing compilers are accident prone and easily made unnecessary by careful coding where it matters, and that I am only interested in general purpose architectures. It is always possible to design a specific architecture that isn't such... In particular there are two uses for registers: one is as inputs to functional units, and the other as cache. There are machines, especially ones with stack orientedness and caches, that have only specialized registers, i.e. each register is only there as input to a functional units. The Manchester mainframes were specialized register machines. Crisp is in some sense such a machine as well. -- Piercarlo "Peter" Grandi | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcvax!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk