Xref: utzoo comp.lang.c:8532 comp.lang.misc:1322 Path: utzoo!mnetor!uunet!husc6!uwvax!umn-d-ub!umn-cs!hall!pmk From: pmk@hall.cray.com (Peter Klausler) Newsgroups: comp.lang.c,comp.lang.misc Subject: Languages vs. machines (was Re: The need for D-scussion) Message-ID: <5308@hall.cray.com> Date: 25 Mar 88 02:07:51 GMT References: <12176@brl-adm.ARPA> <1988Mar11.215238.976@utzoo.uucp> <10763@mimsy.UUCP> Organization: Cray Research, Inc., Mendota Heights, MN Lines: 48 Summary: Value of knowing your target architecture In article <10763@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) writes: > In article <719@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes: > >As long as programmers are taught to think in terms of a language, rather > >than the machine, it will be difficult to get vendors to allow the forcing > >of inline. > > As long as programmers are taught to think in terms of a machine, > rather than the language, it will be difficult to get portable code > that can be moved to that 1 teraflop computer that will come out > tomorrow, then to the 10 teraflop computer that will come out the day > after, and then to the 100 teraflop computer that will come out a week > from Monday. A routine written for optimal performance on one architecture may be coded so that it is portable to others, but it'll be optimal on only the one. Write for optimal performance on a VAX and your code will crawl on a Cray. Code for maximal portability and you'll be suboptimal everywhere, having reduced yourself to a lowest common denominator of machine. Compilers can hide lots of annoying differences, but they can't change a nifty linked VAX-optimal data structure into a simple vectorizable Cray-optimal array. This is not to say that performance goals necessarily rule out portability, just that portability may be restricted to a smaller range of systems. Chris' hypothetical 1TFLOP, 10TFLOP, and 100TFLOP machines (see your local Mimsy Data sales office :-)) are not likely to be much like an 8088 or 6502. But they could look similar to each other - and their differences in instruction timings, functional units, and processor counts would be differences that one can expect a reasonable compiler to handle. It's reasonable to write code that's portable to both the Cray-1/A and the Cray-3 (with minimal machine-dependent tweaking) and still give maximal performance on each, but you'll sacrifice PC/AT portability in the process. What I'm getting at with all this rambling is this: Knowing your target architecture is a GOOD THING if you're after optimal performance. Knowing your processor's assembly language - and how to optimize it - will make you a better user of your "high-level" language (FORTRAN, C, etc.). It will also frustrate you, as it has me, and as I fear it has frustrated Herman. (Could a language be designed that would preserve some measure of portability across some set of supercomputer systems while allowing maximal control and performance? C and C++ are great at this amongst scalar byte-addressable architectures but seem inadequate (to me!) to the task of extracting the most speed from a multiprocessor vector box with only large-word addressing. I've been trying (on and off) to devise such a language. Not easy, and probably not desirable, but an interesting experiment nontheless. Mail ideas, please.) [The above is solely my opinion, and is not to be construed as the viewpoint of CRI or anyone else, including my other personalities. Have a nice weekend.]