Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!rutgers!ames!ucbcad!ucbvax!CORY.BERKELEY.EDU!dillon From: dillon@CORY.BERKELEY.EDU (Matt Dillon) Newsgroups: comp.sys.amiga Subject: 6502 Vs 68000, lets get it straight . Message-ID: <8703090544.AA03041@cory.Berkeley.EDU> Date: Mon, 9-Mar-87 00:44:21 EST Article-I.D.: cory.8703090544.AA03041 Posted: Mon Mar 9 00:44:21 1987 Date-Received: Mon, 9-Mar-87 19:37:18 EST Sender: daemon@ucbvax.BERKELEY.EDU Organization: University of California at Berkeley Lines: 149 Don't get clumsy now! Let me get it straight for everybody: The 6502 takes one clock cycle to do an 8-bit memory fetch. The number of clock cycles required to execute an instruction is in most cases exactly the number of memory operations required to read and execute the instruction. Thus, a LDA absolute requires 4 memory fetches and thus 4 clock cycles. (3 fetches for the instructions, 1 for the absolute memory operation). There are some expceptions.. most single byte instructions like TAX take 2 clock cycles even though there is only one memory fetch. A 68000 on the other hand takes 4 clock cycles for each memory fetch, and fetches data 16-bits at a time. instruction execution times are, in general, related to the number of memory operations required. Most longword operations take an extra 2 clock cycles (not memory cycles) due to internal processing. SO. In terms of basic throughput, an 8Mhz 68000 is 4 times faster than a 1Mhz 6502. (32bits/uS vs 8bits/uS). HOWEVER, the 68000 allows you to do a more complex range of operations in the same time. Specifically, a 68000 can manipulate 16 and 32 bit quantities and the 6502 can only manipulate 8 bit quantities. Attempting to make the 6502 do, say, a 16bit add immediate to memory requires about 7 instructions (CLC/LDA/ADC#/STA/LDA/ADC#/STA)=17cc whereas a 68000 can do it in a single instruction (ADD)=16cc. So the 6502 can be thought of as fast only if you're program doesn't require anything beyond 8 bit quantitiy sizes. Even if you spent 24 hours optimizing your 6502 code, you can't really do a 16bit add in anything less than four instructions, and that's assuming one addend is already loaded into registers A and X and the carry is set to something meaningful. Each 68000 instruction is about 4x more powerful than a 6502 instruction. Now, a 68000 instruction is, on the average, twice as long as a 6502 instruction... And I'm being very generous to the 6502 here. So, putting it all together: 8 Mhz 68000 Vs 1 Mhz 6502 Basic throughput 4x Take into account power of 68000 (16/32bit registers & operations): 4x Take into account instruction size: .5x Overall rating: 8x. The jist is that the clock rating reflects the relative differences between a 6502 and a 68000. (Obviously this generalization only applies to the 6502 vs 68000). Thus if an 8Mhz 6502 did exist, it would probably be on par with a 68000. NOTE: the previous argument is very generous towards the 6502.... I do not take into account the large number of registers on the 68000 or its expanded address space. Example 2: Tight loop copy 256 bytes from absolute location 6502: ldx #0 time ~= 256*(4+4+2+3) = 3328 clock cycles loop: lda src,x sta dest,x dex bne loop 68000: move.l src,a0 time ~= 64*(20+10) = 1920 clock cycles move.l dest,a1 move.w #256/4,d0 loop: move.l (a0)+,(a1)+ dbf D0,loop result: 8Mhz 68000 about 14x a 1Mhz 6502 Note that for the 6502 program to copy more than 256 bytes, the most efficient routine is a self-modifying code routine that has an inner loop equivalent to the above example and an outer loop which modifies the MSB address in the LDA and STA instructions. this effectively gives the same throughput. Example 3: 16 bit add 6502: (add .Alsb .Xmsb to zero page memory) time = 16 cc clc adc dest sta dest txa adc dest+1 sta dest+1 68000: (add D0 to register indirect (Aztec small data model)) add.w d0,off(Ax) time = 16 cc result: 8Mhz 68000 about 8x a 1Mhz 6502 NOTE: register-register ADD takes only 4 clock cycles. NOTE: addressing modes picked to best represent programming enviroment. Example 4: 32 bit add 6502: (add .Alsb .Xmsb and zero-page src to zero page destination) clc time = 34 cc adc dest sta dest txa adc dest+1 sta dest+1 lda src adc dest+2 sta dest+2 lda src+1 adc dest+3 sta dest+3 68000: add.l D0,off(Ax) time = 24 cc result: 8Mhz 68000 about 11x a 1Mhz 6502 Example 5: Simple table driven PLOT x,y onto some screen . Assume will do many plots. 6502: plot (x, y).. max 256x256 drawing area. lda scanlinelsb,y time = 33 cc sta zeropage lda scanlinemsb,y sta zeropage+1 (takes 3cc) ldy columnindex,x lda (zeropage),y (takes 5cc) ora bittable,x (takes 4cc) sta (zeropage),y (takes 6cc) 68000: plot (D0, D1).. max 2048 pixels on the X axis, 8192 on the Y registers (as they would be for multiple plots): A0=scanline table of longword screen address A1=columnindex (table of bytes column convert) A2=bittable (table of byte masks) time = 72 cc asl.w #2,D1 ;y = y * 4 to get index into longword array move.l 0(A0,D1.w),A3 ;get scanline add.w 0(A1,D0.w),A3 ;incorporate columnindex move.b 0(A2,D0.w),D0 ;get mask (80/40/20/10/08/04/02/01) or.b D0,(A3) ;write it to screen Results: 8 Mhz 68000 only 3.7x a 1 Mhz 6502 -Matt