Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!cs.utexas.edu!samsung!munnari.oz.au!bruce!alanf
From: alanf@bruce.OZ (Alan Grant Finlay)
Newsgroups: comp.sys.ibm.pc
Subject: Re: Turbo C far pointers
Message-ID: <1697@bruce.OZ>
Date: 17 Nov 89 04:51:37 GMT
Organization: Monash Uni. Computer Science, Australia
Lines: 69


In Article <6689@esegue.segue.boston.ma.us> John R. Levine writes:

>In article <579@gmuvax2.gmu.edu> 2179ak@gmuvax2.UUCP (JDPorter) writes:
>>>>That's ridiculous. *(A+1), A[1], and 1[A] all produce exactly the same code.
>
>>*(A+1) does NOT produce the same code as A[1].  (not for MSC, anyway.)
>
>Congratulations, you've found an optimization bug in MSC.  In Turbo, assuming

As I am doing research in programming language semantics I can't resist putting
my oar in.  This is an issue I have much meditated upon in the past.  As I see
it the semantics of high level programming languages are best left independent
of efficiency specifications.  This is partly due to the need for machine
independence but also for more philosophical reasons.  With high level languages
the programmer wants to be able to say what is to be done without being
concerned about how it is done.  With assembly languages the reverse is the
case (i.e. if you can't do what you want efficiently then you change the
requirements).  Hence for high level languages we have optimisation of the 
generated code.  I have never heard of an optimiser intended for hand written
assembly code (except maybe rumours from the AI community).

This brings me to the problem child C which has characteristics of both
high and low level languages.  C is undoubtably a popular language much to
my disgust.  If C replaces COBOL we will be no better off.  There, I've said
it, and will probably never live it down.  More seriously though what does
"Kernighan and Ritchie" say?  Page 94 (1978 edition): 

   "Rather more surprising, at least at first sight, is the fact that a
    reference to a[i] can also be written as *(a+i).  In evaluating a[i],
    C converts it to *(a+i) immediately; the two forms are completely
    equivalent."

Although the meaning of "equivalent" is not further specified we are given
the additional clue that a conversion on a syntactic level can be presumed
to have taken place.  A similar discussion in the appendix page 210 states:

   "By definition, the subscript operator [] is interpreted in such a
    way that E1[E2] is identical to *((E1)+(E2)).  Because of the
    conversion rules which apply to +, if E1 is an array and E2 an 
    integer, then E1[E2] refers to the E2-th member of E1.  Therefore
    despite its asymmetric appearance, subscripting is a commutative
    operation."

From the tone of the discussion I draw the following conclusions:

1) The equivalence referred to is "equivalence in effect" and does not
   dictate the means by which this effect is produced.  The language 
   manual occasionally refers to machine dependencies but hardly
   presumes to dictate the code generated.  In fact it states that:
   (page 212) "Some difficulties arise only when dubious coding 
   practices are used.  It is exceedingly unwise to write programs 
   which depend on any of these properties."

2) The C language assumes it is implemented on a certain class of machine 
   which we may broadly classify as "Von Neumann" or perhaps more accurately
   as "linear addressable data and code".  We may not presume that
   the instruction set architecture has index registers.  Some form of
   indirect addressing must be achievable. 

I think to assume that source code which is equivalent "by definition"
must generate the same (essentially) object code in a single code object,
is a dubious coding practice.  Although the C language is clearly defined
for efficient programming on a certain class of machine there are no
guarantees written into the language definition that such and such a way of
doing something will be more efficient than any other way.

C lovers please post your flames to "comp.lang.c" which I agree to read
for the next few weeks.