Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84; site ccivax.UUCP Path: utzoo!watmath!clyde!cbosgd!ihnp4!qantel!dual!lll-crg!gymble!umcp-cs!seismo!rochester!ritcv!ccivax!rb From: rb@ccivax.UUCP (rex ballard) Newsgroups: net.arch Subject: Re: Scientific Computing and mips Message-ID: <256@ccivax.UUCP> Date: Mon, 16-Sep-85 22:54:28 EDT Article-I.D.: ccivax.256 Posted: Mon Sep 16 22:54:28 1985 Date-Received: Fri, 20-Sep-85 05:34:10 EDT References: <419@kontron.UUCP> <2300001@uicsl> <1093@ames.UUCP> <1119@ames.UUCP> Organization: CCI Telephony Systems Group, Rochester NY Lines: 71 > > > >I think you need to make your performance measurements in such a way that > >you get a set of distinct numbers which can be used analytically to determine > >performance for a given program if you know certain properties of the > >program. For example: > > > >1) The rate of execution of each member of the set of arithmetic operations > >provided by the machine's instruction set, ... > >..., with cache disabled. > > > >2) The rate of execution of 1-word memory-to-memory moves, with cache > >disabled. > > > >3) The rate of execution of a tight loop ...register-to-register > >moves, with cache disabled. > > > >4) The rate of execution of a tight loop ... , with cache enabled. > > > >5) The rate of execution of a tight loop performing (same word size as #3 > >and #4 above) memory-to-memory moves that produce all cache "hits", with > >cache enabled. Note that this gives you two properties of your cache: your > >speedup for operand fetch and store resulting from caching, and any > >performance penalties resulting from a write-through vs. write-back cache. > > > >6) Specifications such as the number of registers available to the user, > >the size of the cache, etc. > > > >Well, you get the idea, anyway... personally I tend to feel that statistical > >performance measurements are not nearly as useful as analytical ones; I > >would rather see a list of fairly distinct performance properties of a pro- > >cessor anytime, since I think you can do more with them in terms of > >saying how the machine will perform for a given application that way. I would like to add a few more tests in this vein. 7) The time required to do a "structured call" (ie: save entire machine state; transfer control to a "minimal subroutine" like "return(arg1+arg2+arg3)" with all arguments on the stack; place the result in single register; and return to caller. The reason for a test like this comes from a study done by M. McGowan. In a study of several million lines of code, the number of revisions of a given source module increased EXPONENTIALLY relative to it's size. Reguardless of the language, the number of revisions increased an average of (1/25)**2. The 25 was the number of lines displayable on the screen at one time. The theoretical ideal ratio between implementing a 'Macro Expansion' and a 'structured call' should theoretically be 0; In convential benchmarks, a "call optimized" computer may show very little superiority. In general purpose applications where "modular software design" is a necessity, the relative performance may double. Unfortunately, such a computer would also have this advantage in general benchmark tests. 8) The time required to do a "context switch" (ie: save entire machine state, get new context, save state, return to old context.) This can be a good indicator of interrupt responsiveness, suitability for multitasking, and "event driven" situations. 9) The time required to save "equivalent states"; a machine with 8 registers may have less to do in "state save" than a machine with 32, but can "hide" the number of "real" state values required for a context switch for benchmarking purposes. (these opinions were my own, but I'm giving them up for adoption) Brought to you by Super Global Mega Corp .com