Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!cs.utexas.edu!samsung!munnari.oz.au!bruce!alanf From: alanf@bruce.OZ (Alan Grant Finlay) Newsgroups: comp.sys.ibm.pc Subject: Re: Turbo C far pointers Message-ID: <1697@bruce.OZ> Date: 17 Nov 89 04:51:37 GMT Organization: Monash Uni. Computer Science, Australia Lines: 69 In Article <6689@esegue.segue.boston.ma.us> John R. Levine writes: >In article <579@gmuvax2.gmu.edu> 2179ak@gmuvax2.UUCP (JDPorter) writes: >>>>That's ridiculous. *(A+1), A[1], and 1[A] all produce exactly the same code. > >>*(A+1) does NOT produce the same code as A[1]. (not for MSC, anyway.) > >Congratulations, you've found an optimization bug in MSC. In Turbo, assuming As I am doing research in programming language semantics I can't resist putting my oar in. This is an issue I have much meditated upon in the past. As I see it the semantics of high level programming languages are best left independent of efficiency specifications. This is partly due to the need for machine independence but also for more philosophical reasons. With high level languages the programmer wants to be able to say what is to be done without being concerned about how it is done. With assembly languages the reverse is the case (i.e. if you can't do what you want efficiently then you change the requirements). Hence for high level languages we have optimisation of the generated code. I have never heard of an optimiser intended for hand written assembly code (except maybe rumours from the AI community). This brings me to the problem child C which has characteristics of both high and low level languages. C is undoubtably a popular language much to my disgust. If C replaces COBOL we will be no better off. There, I've said it, and will probably never live it down. More seriously though what does "Kernighan and Ritchie" say? Page 94 (1978 edition): "Rather more surprising, at least at first sight, is the fact that a reference to a[i] can also be written as *(a+i). In evaluating a[i], C converts it to *(a+i) immediately; the two forms are completely equivalent." Although the meaning of "equivalent" is not further specified we are given the additional clue that a conversion on a syntactic level can be presumed to have taken place. A similar discussion in the appendix page 210 states: "By definition, the subscript operator [] is interpreted in such a way that E1[E2] is identical to *((E1)+(E2)). Because of the conversion rules which apply to +, if E1 is an array and E2 an integer, then E1[E2] refers to the E2-th member of E1. Therefore despite its asymmetric appearance, subscripting is a commutative operation." From the tone of the discussion I draw the following conclusions: 1) The equivalence referred to is "equivalence in effect" and does not dictate the means by which this effect is produced. The language manual occasionally refers to machine dependencies but hardly presumes to dictate the code generated. In fact it states that: (page 212) "Some difficulties arise only when dubious coding practices are used. It is exceedingly unwise to write programs which depend on any of these properties." 2) The C language assumes it is implemented on a certain class of machine which we may broadly classify as "Von Neumann" or perhaps more accurately as "linear addressable data and code". We may not presume that the instruction set architecture has index registers. Some form of indirect addressing must be achievable. I think to assume that source code which is equivalent "by definition" must generate the same (essentially) object code in a single code object, is a dubious coding practice. Although the C language is clearly defined for efficient programming on a certain class of machine there are no guarantees written into the language definition that such and such a way of doing something will be more efficient than any other way. C lovers please post your flames to "comp.lang.c" which I agree to read for the next few weeks.