Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84; site peora.UUCP Path: utzoo!watmath!clyde!burl!ulysses!mhuxr!mhuxt!houxm!vax135!petsd!peora!jer From: jer@peora.UUCP (J. Eric Roskos) Newsgroups: net.arch Subject: Re: I like segmented architectures Message-ID: <1031@peora.UUCP> Date: Fri, 7-Jun-85 09:01:53 EDT Article-I.D.: peora.1031 Posted: Fri Jun 7 09:01:53 1985 Date-Received: Sat, 8-Jun-85 03:56:05 EDT References: <276@spar.UUCP> <5653@utzoo.UUCP> <291@spar.UUCP> Organization: Perkin-Elmer SDC, Orlando, Fl. Lines: 105 > By the way, where are the 68XXX fans? Surely the Motorola chip isn't > so bad that no one can find a defense for it. It's kind of hard to compare the 68000's MMU, which functions in a very familiar, traditional way (the same way MMUs on many "mainframe" machines work), with the very strange segmentation facilities of the 286. Here you've complained again that "64K segments are too small". Now, I have a feeling part of the problem I see here is in our definition of "segments", which varies widely. But I don't think it is the smallness of the "segments" that is the problem. The 8086's way of handling segmentation is not like that of many more familiar machines, 68000 included. In what I will call the "conventional" memory management units, the address field in the instruction is partitioned into subfields, like this: AABBBB (where each digit represents, let us say, 4 bits, for concreteness). The bits AA are used to select an entry in an address translation table in the MMU, which replaces the bit string AA in the original ("virtual") address with some bit string CCCCCC in the generated ("physical") address. The result is some address CCCCCCBBBB Now, there is usually also a size associated with the block of physical memory pointed to by CCCCCC in the AAth entry of this translation table, so the value of BBBB is checked against this number to be sure it is in range. Assuming it is, we have generated our physical address, and can go on to checking the other bits in that table, which tell whether we are allowed to read, write, or execute the location, whether or not it is in memory at present, whether it has been modified, etc. More sophisticated memory management units further partition AA (or add more bits), so that the high order bits select one or more pointer tables which themselves point to other translation tables that are used for the next-lower-order field of bits, etc., but the mechanism is the same. Notice that in this scheme, The size of BBBB doesn't really matter so much. The total amount of space you can address in an instruction (without changing an external register) is the number of distinct addresses that may be represented by AABBBB, which is the number of bits in the instruction's address field for the operand. Now, on the other hand, we have the Intel approach. Intel gives us an instruction address field BBBB and a segmentation register field AAAA We get our physical address from this via AAAA0 +BBBB Now, in the 286, we have an "improvement" in that AAAA is put into a memory management unit table, just like in the "conventional" architecture, but it is still added to the instruction's address field rather than concatenating it. And where does the index into the memory management unit's table come from? Why, it comes from what used to be the segmentation register! So, rather than deriving the index from part of the instruction's address field, it comes from a separate register, which must be set via a MOV (or via an instruction which loads a segment register and an index register in one instruction from consecutive memory words) each time you want to change it. This is what the compiler writers have trouble with. The index into the memory management tables for the 286 are NOT derived from the instruction's address field, transparently as part of the "virtual" address. They have to be explicitly LOADED into a segmentation register. And the enhancements made thus far to the architecture don't improve this; they just add bits to the field in the memory management unit tables. The reason this is such a problem is that if you are generating code that involves data larger than 64K, you have to keep up with what value you last loaded into the segmentation register, so that you can change it if you have to access something that is not in range. And, as the flow of control in the program for which you are generating code becomes more complex, deciding when you need to change the contents of the segmentation register becomes enormously difficult. A compiler that could make this sort of flow analysis for the 8086 family of machines could also do a substantial amount of optimization, with the result that the same compiler for a 68000 would also achieve substantially better results. But at present, there are not really any compilers out there like that. It is rumored that some are in the works; some companies have claimed to produce them already, but the optimization methods thus far have been mostly "peephole" optimizations. The difficulty of writing an optimizer of the sort required is probably larger than that of writing the compiler. And optimization is really the issue here. A fully unoptimized 8086-family program would load the segmentation register before EVERY operand access. The task of the optimizer is to decide when it doesn't have to. And that is the basic problem. -- Full-Name: J. Eric Roskos UUCP: ..!{decvax,ucbvax,ihnp4}!vax135!petsd!peora!jer US Mail: MS 795; Perkin-Elmer SDC; 2486 Sand Lake Road, Orlando, FL 32809-7642 "Zl FB vf n xvyyre junyr."