Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!utgpu!water!watnot!watmath!clyde!cbatt!decuac!cvl!umd5!zben
From: zben@umd5.UUCP
Newsgroups: comp.unix.wizards
Subject: Re: Complaint about complex architectures
Message-ID: <1519@umd5.umd.edu>
Date: Sat, 4-Apr-87 15:39:16 EST
Article-I.D.: umd5.1519
Posted: Sat Apr  4 15:39:16 1987
Date-Received: Sun, 5-Apr-87 12:46:48 EST
References: <15292@amdcad.UUCP> <978@ames.UUCP> <15694@sun.uucp> <5@wb1.cs.cmu.edu> <6042@mimsy.UUCP> <15341@amdcad.UUCP>
Reply-To: zben@umd5.umd.edu (Ben Cranston)
Organization: University of Maryland, College Park
Lines: 41
Summary: An irrelevant, silly comment

In article <15341@amdcad.UUCP> bcase@amdcad.UUCP (Brian Case) writes:

> This brings up one of my major beefs abouts complex archtiectures:  an
> optimizing compiler might have to do different things depending upon
> the *version* of a CPU it is compiling for!  An optimizing compiler
> that is considered "a great compiler" for one version of a CPU might
> be "a mediocre" compiler for the next version of the machine.

Gosh, I seem to remember a Cobol compiler that generated different code for
programs with the following two directives:

Object-Computer is Univac-1108.

Object-Computer is Univac-1108 with four memory boxes.

Forgive me if the dashes are in the wrong places.  It's been a LONG time.
(Not long enough though...)  I don't buy the complexity argument.  You're
arguing that bicycles are better than cars because they are easier to fix
and easier to learn to drive, while completely forgetting the performance
differances.

Case in point:  I just came up with a fast integer square-root routine for
a local project (written in C, available on request).  It has one multiply
within the main loop.  I also have a Unisys 1100 assembly version with NO
multiplies in the loop, but I can't translate it to C because C doesn't
have the double register operations, double precision shifts, and there is
no easy way to code for the LSC (load shift and count) instruction other
than yet another C loop.

I guess the point here is that it is possible for a dedicated assembly
language programmer to effectively utilize these complex architectures
to fly rings around anything written in a higher-level language.  It is
also possible for a really brilliantly written code generator to approach
this kind of performance.  Any attempt to simplify these architectures
had better deliver blinding increases in hardware speed, or I'm still
going to think it's a plot by the programmers and compiler writers to
shirk their responsibilities...
-- 
                    umd5.UUCP    <= {seismo!mimsy,ihnp4!rlgvax}!cvl!umd5!zben
Ben Cranston zben @ umd2.UMD.EDU    Kingdom of Merryland UniSys 1100/92
                    umd2.BITNET     "via HASP with RSCS"