Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!csd4.milw.wisc.edu!leah!rpi!batcomputer!cornell!rochester!pt.cs.cmu.edu!fas.ri.cmu.edu!schmitz From: schmitz@fas.ri.cmu.edu (Donald Schmitz) Newsgroups: comp.arch Subject: Re: SUN procedure inlining(was i860 Dhrystones) Message-ID: <4641@pt.cs.cmu.edu> Date: 4 Apr 89 14:07:31 GMT References: <39388@oliveb.olivetti.com> <15475@winchester.mips.COM> <95013@sun.Eng.Sun.COM> <13641@jumbo.dec.com> <95215@sun.Eng.Sun.COM> <4614@pt.cs.cmu.edu> <1356@auspex.auspex.com> Distribution: na Organization: Carnegie-Mellon University, CS/RI Lines: 35 (Lots of people writing) >>>FYI, this is incorrect. Current Sun C compilers do not perform procedure >>>inlining at any optimization level. ^^^^^^^^^^^^^^^^^^^^^^^^ >> ^^^^^^^^ >>(I wrote) Maybe you should look again > >No, there's not much point in looking again; David is talking about >*general* procedure inlining, which the current Sun compilers do not do, ^^^^^^^^ >at least not to the best of my knowledge - a second look will almost >certainly confirm that. This is quite different from the very >specialized inlining that the "inline" program performs (as I remember, >it performs inlining on the assembly-language output from the compiler), >so if the claim is that the 3/60 results used general procedure >inlining, the claim seems suspicious to me. The only ".il" files I >could find on any of the 4.0 machines around here (both 68K and SPARC) >are 1) files for "libm" - in several flavors for the 68K machines - and >2) some for doing loads from possibly-misaligned locations. To put an end to this, I was responding to the statement that SUN compilers *do no procedure inlining*, looking at the man entry will tell you this isn't true. The original message regarding the -O4 switch was very likely an error, as far as I know there is only one -O switch for the SUN3 family. However, even the crude inlining capability available on the SUN can be used to speed up Dhrystone. Those magic .il files are just assembler with some special directives thrown in, I could easily write a strcmp and strcpy .il file and compile the source with them. Assuming I didn't further massage the assembly code, this would still speed up every subroutine call (it eliminates jsr, link, unlink, and rts, all multi cycle instructions). It would also improve code locality, improving I cache performance (on SUNs with I caches). Since the original post seemed to indicate someone had played loose with the Dhrystone rules, this seemed like a very possible way it could have been done. Don Schmitz --