Xref: utzoo unix-pc.general:7036 comp.sys.att:11350 comp.sources.wanted:14666 comp.lang.c:34976 Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!cs.utexas.edu!sdd.hp.com!think.com!paperboy!meissner From: meissner@osf.org (Michael Meissner) Newsgroups: unix-pc.general,comp.sys.att,comp.sources.wanted,comp.lang.c Subject: Re: What assembler code is generated by C instruction X? Message-ID: Date: 2 Jan 91 05:10:38 GMT References: <1990Dec28.220115.15930@shibaya.lonestar.org> Sender: news@OSF.ORG Organization: Open Software Foundation Lines: 76 In-reply-to: afc@shibaya.lonestar.org's message of 28 Dec 90 22:01:15 GMT In article <1990Dec28.220115.15930@shibaya.lonestar.org> afc@shibaya.lonestar.org (Augustine Cano) writes: | Hello net.land: | | I need to find out what assembler code is generated by compilers/optimizers. | The first thing that comes to mind is: compile x programs and look at the | assembler output. This is highly inefficient for various reasons: much of | the code would be duplicated (who needs to wade through the assembler | generated by passing a parameter by reference 200 times?), a real program | converts to possibly thousands of lines of assembler and some C instructions/ | constructs will most likely be missing. This problem is essentially unsolvable in the general case. You didn't restrict the bounds to a particular machine or set of compiler implementations, so you have to consider every possible compiler that calls itself an optimizing compiler, and you have to check every release, since new optimizations are added all of the time. In addition for multiple target compilers, you have to consider each target serparately (I've gotten radically different code from GCC depending on the machine dependent portions). | The next thing that comes to mind is a bare bones program whose whole purpose | is to use each C instruction/construct once, with function and/or variable | names such that the particular item can be quickly located in the (many) | lines of assembler generated. This would make it relatively easy to | connect assembly code with the C instruction that generated it, for any | compiler/cpu type, without extraneous garbage in the way. | | Has anybody seen something like this? I'd rather not duplicate effort if | someone has even parts of this. Taking care of every case of indirection, | double indirection, types, sizes etc... is likely to be not a trivial | undertaking. As pipelined and/or superscaler chips come out, this is can be impossible because the compilation system may interleave instructions from different statements. For example on the MIPS system, there is a 1 cycle delay from the load until the value appears in a register, and multiple cycles for multiply. Thus, the simple code: a = b * c; d = e + f; Produces the following interleaved instructions: foo: [foo.c: 4] 0x0: 8f8e0000 lw t6,0(gp) ; b [foo.c: 4] 0x4: 8f8f0000 lw t7,0(gp) ; c [foo.c: 5] 0x8: 8f990000 lw t9,0(gp) ; e [foo.c: 4] 0xc: 01cf0019 multu t6,t7 ; (hi,lo)<- b*c [foo.c: 5] 0x10: 8f880000 lw t0,0(gp) ; f [foo.c: 5] 0x14: 00000000 nop [foo.c: 5] 0x18: 03284821 addu t1,t9,t0 ; e+f [foo.c: 5] 0x1c: af890000 sw t1,0(gp) ; d [foo.c: 4] 0x20: 0000c012 mflo t8 ; b*c [foo.c: 4] 0x24: af980000 sw t8,0(gp) ; a [foo.c: 4] 0x28: 00000000 nop [foo.c: 6] 0x2c: 03e00008 jr ra ; return [foo.c: 6] 0x30: 00000000 nop ; delay slot | Such a program could be the "training" part for a universal de-compiler. | Once the assembler output of a specific compiler/cpu type has been generated | from this program, the de-compiler could then re-generate the original | C source (within limits). Part 2 would obviously be more difficult to | implement and I suspect such a thing only exists now for specific cpu | types (and for significant $s too). In any case I'm only interested in the | "training" part now. I suggest instead using the symbolic debug information. If your compiler does not support full optimization and debugging, consider changing compilers..... -- Michael Meissner email: meissner@osf.org phone: 617-621-8861 Open Software Foundation, 11 Cambridge Center, Cambridge, MA, 02142 Considering the flames and intolerance, shouldn't USENET be spelled ABUSENET?