Xref: utzoo comp.lang.c:8532 comp.lang.misc:1322
Path: utzoo!mnetor!uunet!husc6!uwvax!umn-d-ub!umn-cs!hall!pmk
From: pmk@hall.cray.com (Peter Klausler)
Newsgroups: comp.lang.c,comp.lang.misc
Subject: Languages vs. machines (was Re: The need for D-scussion)
Message-ID: <5308@hall.cray.com>
Date: 25 Mar 88 02:07:51 GMT
References: <12176@brl-adm.ARPA> <1988Mar11.215238.976@utzoo.uucp> <10763@mimsy.UUCP>
Organization: Cray Research, Inc., Mendota Heights, MN
Lines: 48
Summary: Value of knowing your target architecture

In article <10763@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) writes:
> In article <719@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
> >As long as programmers are taught to think in terms of a language, rather
> >than the machine, it will be difficult to get vendors to allow the forcing
> >of inline.
> 
> As long as programmers are taught to think in terms of a machine,
> rather than the language, it will be difficult to get portable code
> that can be moved to that 1 teraflop computer that will come out
> tomorrow, then to the 10 teraflop computer that will come out the day
> after, and then to the 100 teraflop computer that will come out a week
> from Monday.

A routine written for optimal performance on one architecture may be coded
so that it is portable to others, but it'll be optimal on only the one.
Write for optimal performance on a VAX and your code will crawl on a Cray.
Code for maximal portability and you'll be suboptimal everywhere, having
reduced yourself to a lowest common denominator of machine.  Compilers can
hide lots of annoying differences, but they can't change a nifty linked
VAX-optimal data structure into a simple vectorizable Cray-optimal array.

This is not to say that performance goals necessarily rule out portability,
just that portability may be restricted to a smaller range of systems.
Chris' hypothetical 1TFLOP, 10TFLOP, and 100TFLOP machines (see your
local Mimsy Data sales office :-)) are not likely to be much like
an 8088 or 6502.  But they could look similar to each other - and their
differences in instruction timings, functional units, and processor counts
would be differences that one can expect a reasonable compiler to handle.
It's reasonable to write code that's portable to both the Cray-1/A and
the Cray-3 (with minimal machine-dependent tweaking) and still give maximal
performance on each, but you'll sacrifice PC/AT portability in the process.

What I'm getting at with all this rambling is this: Knowing your target
architecture is a GOOD THING if you're after optimal performance.  Knowing
your processor's assembly language - and how to optimize it - will make you
a better user of your "high-level" language (FORTRAN, C, etc.).  It will also
frustrate you, as it has me, and as I fear it has frustrated Herman.

(Could a language be designed that would preserve some measure of portability
 across some set of supercomputer systems while allowing maximal control and
 performance? C and C++ are great at this amongst scalar byte-addressable
 architectures but seem inadequate (to me!) to the task of extracting the most
 speed from a multiprocessor vector box with only large-word addressing.
 I've been trying (on and off) to devise such a language.  Not easy, and probably
 not desirable, but an interesting experiment nontheless.  Mail ideas, please.)

[The above is solely my opinion, and is not to be construed as the viewpoint of
 CRI or anyone else, including my other personalities.  Have a nice weekend.]