Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!sol.ctr.columbia.edu!emory!wa4mei!holos0!lbr From: lbr@holos0.uucp (Len Reed) Newsgroups: comp.sys.ibm.pc.hardware Subject: Re: Comparing 486 to 386 Systems Message-ID: <1991Apr15.200722.17774@holos0.uucp> Date: 15 Apr 91 20:07:22 GMT References: <1991Apr7.033635.18412@agate.berkeley.edu> <1991Apr11.001619.6952@holos0.uucp> <2328@pdxgate.UUCP> Organization: Holos Software, Inc., Atlanta, GA Lines: 66 In article <2328@pdxgate.UUCP> berggren@eecs.cs.pdx.edu (Eric Berggren) writes: >lbr@holos0.uucp (Len Reed) writes: > >>But the point being made was that 16-bit addressing really cripples you. > > That's 64k, is that what you mean? 8088/86 use 20-bit addressing. (just >nit picking..) I'm not talking about how much memory the chip can address but rather how much memory a program can *efficiently* address. Index registers and memory offsets are 16-bits. The other 4 bits can be used only at great pain. To deal with data spaces greater than 64 K-bytes you have to fool with the segment registers. Consider the following: extern int *p, q, *r; q = *p; r = p; In 32-bit 386 mode, this is something like mov ecx, [p] ; 1 memory access mov [r], ecx ; 1 memory access mov eax, [ecx] ; 1 memory access mov [q], eax ; 1 memory access For small-model C on the 8088/8086 we get pretty much the same. (Though our integers are only 16 bits, and we are restricted to BX/DI/SI for pointers.) But if we have to go to large model, things get ugly. Microsoft 6.0 with optimization on full came up with the following: mov es,[segment_of_p] ; 1 memory access (*) les bx,es:[p] ; 2 mov ax,es:[bx] ; 1 mov cx,es mov es,[segment_of_q] ; 1 (*) mov es:[q],ax ; 1 mov es,[segment_of_r] ; 1 (*) mov es:[r],bx ; 1 mov es:[r+2],cx ; 1 (total is 9) (*) It could be a tiny bit faster if the compiler used immediate moves to load the ES register. E.g., the first one would be. mov ax, segment p mov es, ax The compiler has really jumped through hoops here to deal with addresses beyond 16 bits. And we're only getting 16-bit integers, too. It doesn't much more code before the optimization seen above starts to break down because we need 2 registers to hold a pointer or a long and because only the [bx], [si], [di], [bx+si], [bx+di] are available for addresses. The 386 allows stuff like [eax + 4*ecx], which is nice for accessing arrays. My example went from 4 instructions with 4 memory accesses to 9 and 9. The point at which this catastrophe of inefficiency occurs is when we wish to access more that 64 K of data. -- Len Reed Holos Software, Inc. Voice: (404) 496-1358 UUCP: ...!gatech!holos0!lbr