Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 8/28/84; site lll-crg.ARPA Path: utzoo!watmath!clyde!burl!ulysses!allegra!mit-eddie!think!harvard!seismo!umcp-cs!gymble!lll-crg!brooks From: brooks@lll-crg.ARPA (Eugene D. Brooks III) Newsgroups: net.micro.68k,net.arch Subject: Re: EA orthogonality Message-ID: <634@lll-crg.ARPA> Date: Sun, 9-Jun-85 02:12:15 EDT Article-I.D.: lll-crg.634 Posted: Sun Jun 9 02:12:15 1985 Date-Received: Mon, 10-Jun-85 08:25:32 EDT References: <419@oakhill.UUCP> <6415@boring.UUCP> <557@terak.UUCP> <6417@boring.UUCP> <572@terak.UUCP> <6431@boring.UUCP> <467@rtech.UUCP> Organization: Lawrence Livermore Labs, CRG group Lines: 42 Xref: watmath net.micro.68k:891 net.arch:1349 > And furthermore, the orthogonal sequence is normally atomic; > in an OS kernel the non-orthogonal sequence might easily have to > be protected by a "disable/enable interrupt" sequence around it, > or "test-and-set" or some such in a multi-processor system > (e.g., "a" and "b" might be global vars). > Multi-process user-programs would need "enter/exit monitor" or > "block-on-semaphore" sequences. Besides being a pain (sometimes > a royal pain) this has the potential for eating a lot of CPU time. > -- Considerations for multiprocessing are one of the strongest arguments in favor of a load/store type of instruction set. The fundamental problem to be overcome in a multiprocessor is memory latency. You increase efficiency in an environment with high memory latency by using a load/store type of instruction set in conjunction with a processor composed of pipelined functional units and careful instruction ordering. For example: a += b; load r0,_a load r1,_b add r0,r1 store r0,_a The performance gain is achieved with there is more work to do. For example: a += b; c += d; load r0,_a load r1,_b load r2,_c load r3,_d add r0,r1 add r2,r3 store r0,_a store r2,_c The loads overlap their latencies resulting in a higher performance than is capable with the sequence add _a,_b add _c,_d