Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sun-barr!apple!altos!altos86!rcollins From: rcollins@altos86.Altos.COM (Robert Collins) Newsgroups: comp.sys.ibm.pc.misc Subject: Re: Difference between a 386 and a 386sx Message-ID: <4126@altos86.Altos.COM> Date: 28 Sep 90 15:34:50 GMT References: <1412@svin02.info.win.tue.nl> <4388@bwdls58.UUCP> <1990Sep20.185214.780@sj.ate.slb.com> <1990Sep21.002015.1201@thyme.jpl.nasa.gov> Reply-To: rcollins@altos86.UUCP (Robert Collins) Organization: Altos Computer Systems, San Jose, CA Lines: 62 (I tried to send this privately, via email, but it bounced...so I'll post it). JSP @ washington.edu wrote: > > Am I just dense, or what? There are lots of programs >around that distinguish DX from SX machines. Most of >them just seem to check the length of the prefetch >queue. Code to do this has been posted. Has something >happened to break this approach? Why the agonizing? Just a friendly note in case you don't know much about '386's...But the SX and DX have the same size prefetch queue. This can be confirmed by simply looking in the SX and DX hardware reference manuals (respectively) and verify that both have 16-byte prefetch queues. So code that checks the length of the prefetch queue isn't going to determine anything... except the length of the prefetch queue. Yes, there was code posted a while back that used this approach. Actually if you analyze this code, and provided a reasonable knowledge of how the internals of the CPU works, it is pretty clear that the posted code doesn't work...even to tell the size of the prefetch queue. What the poster failed to realize is that there is pipelining inside the CPU. The bus unit is fetching while the CPU doesn't need the bus. The decode unit is decoding in parallel with the execution unit. When the execution unit finishes, it signals the bus unit that it needs to store a value. (I got ahead of myself by not explaining that the posted algorithm attempted to modify code just outside the bounds of the prefetch queue, and therefore determine its size.) Finally when the bus unit is signaled, it stores the data. If that data is outside the prefetch queue, then the modified code gets executed. There is a very serious problem with this approach. What happens when you reduce wait states, turn up the clock, and add a cache? The bus unit gets control at a different time in the time line of CPU clocks. Therefore, this approach fails. This can easily be verified if you have access to various computers from SX to DX, 16, 25, 33Mhz with and without caches. Take that algorithm and see if it works. It doesn't, it miserably fails. (Yes I tried it.) Since the SX and DX have the same size prefetch queue anyways, the poster shouldn't have claimed his algorithm worked based on the principal of prefetch size. In fact, if the algorithm worked at all, it wasn't because of prefetch size, but because of the the prefetch UNIT WORD SIZE. On the SX, each prefetch unit is 16-bits, on the DX, each prefetch unit is 32-bits. So, the algorithm worked by accident, not by design. But finally, using the correct approach to try and determine differences in prefetch unit word size, now let's write an algorithm. Again it fails on various machines with different wait states, CPU speed, and caches. I guess you saw the original posting, but missed my followup when I explained all of this. So, to answer your statement...the posted algorithm simply doesn't work and that's why all the fuss. -- "Worship the Lord your God, and serve him only." Mat. 4:10 Robert Collins UUCP: ...!sun!altos86!rcollins HOME: (408) 225-8002 WORK: (408) 432-6200 x4356