Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!seismo!rutgers!ames!ucbcad!ucbvax!CORY.BERKELEY.EDU!dillon
From: dillon@CORY.BERKELEY.EDU (Matt Dillon)
Newsgroups: comp.sys.amiga
Subject: 6502 Vs 68000, lets get it straight .
Message-ID: <8703090544.AA03041@cory.Berkeley.EDU>
Date: Mon, 9-Mar-87 00:44:21 EST
Article-I.D.: cory.8703090544.AA03041
Posted: Mon Mar  9 00:44:21 1987
Date-Received: Mon, 9-Mar-87 19:37:18 EST
Sender: daemon@ucbvax.BERKELEY.EDU
Organization: University of California at Berkeley
Lines: 149


	Don't get clumsy now!  Let me get it straight for everybody:

The 6502 takes one clock cycle to do an 8-bit memory fetch.  The number of
clock cycles required to execute an instruction is in most cases exactly
the number of memory operations required to read and execute the instruction.
Thus, a LDA absolute requires 4 memory fetches and thus 4 clock cycles.
(3 fetches for the instructions, 1 for the absolute memory operation).
There are some expceptions.. most single byte instructions like TAX take 2
clock cycles even though there is only one memory fetch.

A 68000 on the other hand takes 4 clock cycles for each memory fetch, and
fetches data 16-bits at a time.  instruction execution times are, in general,
related to the number of memory operations required.  Most longword operations
take an extra 2 clock cycles (not memory cycles) due to internal processing.

SO.  In terms of basic throughput, an 8Mhz 68000 is 4 times faster than a 
1Mhz 6502. (32bits/uS vs 8bits/uS).  HOWEVER, the 68000 allows you to do
a more complex range of operations in the same time.  Specifically, a 68000
can manipulate 16 and 32 bit quantities and the 6502 can only manipulate 8
bit quantities.  Attempting to make the 6502 do, say, a 16bit add immediate
to memory requires about 7 instructions (CLC/LDA/ADC#/STA/LDA/ADC#/STA)=17cc 
whereas a 68000 can do it in a single instruction (ADD)=16cc.  So the 6502
can be thought of as fast only if you're program doesn't require anything
beyond 8 bit quantitiy sizes.  Even if you spent 24 hours optimizing your 
6502 code, you can't really do a 16bit add in anything less than four 
instructions, and that's assuming one addend is already loaded into registers
A and X and the carry is set to something meaningful.  Each 68000 instruction
is about 4x more powerful than a 6502 instruction.


Now, a 68000 instruction is, on the average, twice as long as a 6502
instruction... And I'm being very generous to the 6502 here.

So, putting it all together:	8 Mhz 68000 Vs 1 Mhz 6502
	Basic throughput 4x
	Take into account power of 68000 (16/32bit registers & operations): 4x
	Take into account instruction size: .5x

	Overall rating:	8x.


The jist is that the clock rating reflects the relative differences between
a 6502 and a 68000. (Obviously this generalization only applies to the 6502
vs 68000).  Thus if an 8Mhz 6502 did exist, it would probably be on par with
a 68000.

NOTE: the previous argument is very generous towards the 6502.... I do not
take into account the large number of registers on the 68000 or its expanded
address space.


Example 2:  Tight loop copy 256 bytes from absolute location

6502:	ldx #0		time ~= 256*(4+4+2+3) = 3328 clock cycles
loop:	lda src,x
	sta dest,x
	dex
	bne loop

68000:	move.l	src,a0	time ~= 64*(20+10) = 1920 clock cycles
	move.l	dest,a1
	move.w	#256/4,d0
loop:	move.l	(a0)+,(a1)+
	dbf D0,loop

	result: 8Mhz 68000 about 14x a 1Mhz 6502

	Note that for the 6502 program to copy more than 256 bytes, the 
	most efficient routine is a self-modifying code routine that
	has an inner loop equivalent to the above example and an outer loop
	which modifies the MSB address in the LDA and STA instructions.  this
	effectively gives the same throughput.

Example 3:	16 bit add
6502:	(add .Alsb .Xmsb to zero page memory)	time = 16 cc
	clc
	adc dest
	sta dest
	txa
	adc dest+1
	sta dest+1

68000:	(add D0 to register indirect (Aztec small data model))
	add.w d0,off(Ax)				time = 16 cc

	result: 8Mhz 68000 about 8x a 1Mhz 6502
	NOTE: register-register ADD takes only 4 clock cycles.
	NOTE: addressing modes picked to best represent programming
	enviroment.

Example 4:	32 bit add
6502:	(add .Alsb .Xmsb and zero-page src to zero page destination)
	clc						time = 34 cc
	adc dest
	sta dest
	txa
	adc dest+1
	sta dest+1
	lda src
	adc dest+2
	sta dest+2
	lda src+1
	adc dest+3
	sta dest+3

68000:	add.l	D0,off(Ax)				time = 24 cc
	
	result: 8Mhz 68000 about 11x a 1Mhz 6502


Example 5:	Simple table driven PLOT x,y onto some screen .  Assume
		will do many plots.

6502:	plot (x, y).. max 256x256 drawing area.
	lda scanlinelsb,y				time = 33 cc
	sta zeropage
	lda scanlinemsb,y
	sta zeropage+1		(takes 3cc)
	ldy columnindex,x
	lda (zeropage),y	(takes 5cc)
	ora bittable,x		(takes 4cc)
	sta (zeropage),y	(takes 6cc)

68000:	plot (D0, D1).. max 2048 pixels on the X axis, 8192 on the Y
	registers (as they would be for multiple plots):
		A0=scanline table of longword screen address
		A1=columnindex (table of bytes column convert)
		A2=bittable (table of byte masks)

							time = 72 cc

	asl.w	#2,D1		;y = y * 4 to get index into longword array
	move.l	0(A0,D1.w),A3	;get scanline
	add.w	0(A1,D0.w),A3	;incorporate columnindex
	move.b	0(A2,D0.w),D0	;get mask (80/40/20/10/08/04/02/01)
	or.b	D0,(A3)		;write it to screen


	Results: 8 Mhz 68000 only 3.7x a 1 Mhz 6502


				-Matt