Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.2 9/18/84; site peora.UUCP
Path: utzoo!watmath!clyde!burl!ulysses!mhuxr!mhuxt!houxm!vax135!petsd!peora!jer
From: jer@peora.UUCP (J. Eric Roskos)
Newsgroups: net.arch
Subject: Re: I like segmented architectures
Message-ID: <1031@peora.UUCP>
Date: Fri, 7-Jun-85 09:01:53 EDT
Article-I.D.: peora.1031
Posted: Fri Jun  7 09:01:53 1985
Date-Received: Sat, 8-Jun-85 03:56:05 EDT
References: <276@spar.UUCP> <5653@utzoo.UUCP> <291@spar.UUCP>
Organization: Perkin-Elmer SDC, Orlando, Fl.
Lines: 105

> By the way, where are the 68XXX fans?  Surely the Motorola chip isn't
> so bad that no one can find a defense for it.

It's kind of hard to compare the 68000's MMU, which functions in a very
familiar, traditional way (the same way MMUs on many "mainframe" machines
work), with the very strange segmentation facilities of the 286.

Here you've complained again that "64K segments are too small".  Now, I
have a feeling part of the problem I see here is in our definition of
"segments", which varies widely.  But I don't think it is the smallness of
the "segments" that is the problem.

The 8086's way of handling segmentation is not like that of many more
familiar machines, 68000 included.  In what I will call the "conventional"
memory management units, the address field in the instruction is partitioned
into subfields, like this:

	AABBBB

(where each digit represents, let us say, 4 bits, for concreteness).  The
bits AA are used to select an entry in an address translation table in the
MMU, which replaces the bit string AA in the original ("virtual") address
with some bit string CCCCCC in the generated ("physical") address.  The
result is some address

	CCCCCCBBBB

Now, there is usually also a size associated with the block of physical
memory pointed to by CCCCCC in the AAth entry of this translation table, so
the value of BBBB is checked against this number to be sure it is in range.
Assuming it is, we have generated our physical address, and can go on to
checking the other bits in that table, which tell whether we are allowed to
read, write, or execute the location, whether or not it is in memory at
present, whether it has been modified, etc.

More sophisticated memory management units further partition AA (or add
more bits), so that the high order bits select one or more pointer tables
which themselves point to other translation tables that are used for the
next-lower-order field of bits, etc., but the mechanism is the same.

Notice that in this scheme, The size of BBBB doesn't really matter so much.
The total amount of space you can address in an instruction (without
changing an external register) is the number of distinct addresses that may
be represented by AABBBB, which is the number of bits in the instruction's
address field for the operand.

Now, on the other hand, we have the Intel approach.  Intel gives us an
instruction address field

	BBBB

and a segmentation register field

	AAAA

We get our physical address from this via

	AAAA0
	+BBBB

Now, in the 286, we have an "improvement" in that AAAA is put into a
memory management unit table, just like in the "conventional" architecture,
but it is still added to the instruction's address field rather than
concatenating it.  And where does the index into the memory management unit's
table come from?  Why, it comes from what used to be the segmentation
register!  So, rather than deriving the index from part of the
instruction's address field, it comes from a separate register, which
must be set via a MOV (or via an instruction which loads a segment register
and an index register in one instruction from consecutive memory words)
each time you want to change it.

This is what the compiler writers have trouble with.  The index into the
memory management tables for the 286 are NOT derived from the instruction's
address field, transparently as part of the "virtual" address.  They have
to be explicitly LOADED into a segmentation register.  And the enhancements
made thus far to the architecture don't improve this; they just add bits to
the field in the memory management unit tables.  The reason this is such
a problem is that if you are generating code that involves data larger than
64K, you have to keep up with what value you last loaded into the segmentation
register, so that you can change it if you have to access something that is
not in range.  And, as the flow of control in the program for which you
are generating code becomes more complex, deciding when you need to change
the contents of the segmentation register becomes enormously difficult.
A compiler that could make this sort of flow analysis for the 8086 family
of machines could also do a substantial amount of optimization, with the
result that the same compiler for a 68000 would also achieve substantially
better results.  But at present, there are not really any compilers out
there like that.  It is rumored that some are in the works; some companies
have claimed to produce them already, but the optimization methods thus far
have been mostly "peephole" optimizations.  The difficulty of writing an
optimizer of the sort required is probably larger than that of writing
the compiler.

And optimization is really the issue here.  A fully unoptimized 8086-family
program would load the segmentation register before EVERY operand access.
The task of the optimizer is to decide when it doesn't have to.

And that is the basic problem.
-- 
Full-Name:  J. Eric Roskos
UUCP:       ..!{decvax,ucbvax,ihnp4}!vax135!petsd!peora!jer
US Mail:    MS 795; Perkin-Elmer SDC;
	    2486 Sand Lake Road, Orlando, FL 32809-7642

	    "Zl FB vf n xvyyre junyr."