Path: utzoo!utgpu!news-server.csri.toronto.edu!clyde.concordia.ca!mcgill-vision!snorkelwacker!usc!samsung!uakari.primate.wisc.edu!uflorida!haven!mimsy!chris
From: chris@mimsy.umd.edu (Chris Torek)
Newsgroups: comp.arch
Subject: Re: Is handling off-alignment important?
Summary: bad antecedent for `this'
Keywords: VAX, quad-word, alignment
Message-ID: <26506@mimsy.umd.edu>
Date: 12 Sep 90 13:19:33 GMT
References: <104037@convex.convex.com> <8840014@hpfcso.HP.COM> <410@news.nd.edu>
Organization: U of Maryland, Dept. of Computer Science, Coll. Pk., MD 20742
Lines: 72

>[misstatement about VAX quad-word loads ignoring low-order bits]

In article <26376@mimsy.umd.edu> I wrote:
>Since there is no mention of this in the VAX architecture handbook ...

>In article <1990Sep8.225345.745@quick.com> srg@quick.com (Spencer Garrett)
suggested:
>>   What may well have happened is that some early LISP implementer just
>>"tried it" and found that on his vax the low order bits were ignored.
>>so maybe he went ahead and used it.  *BUT* since it isn't in the manual,

In article <410@news.nd.edu>, przemek@liszt.helios.nd.edu (Przemek Klosowski)
writes:
>But, IT IS IN THE MANUAL!  [see page 33 of the VAX architecture book]

I probably should have followed up to Spencer Garrett's posting myself.
What I meant in <26376@mimsy.umd.edu> was `no mention of ignoring low
order bits on movq', not `no mention of alignment requirements'.  It is
well-known that the VAX architecture (and therefore its handbook :-) )
allows arbitrary alignment for word and longword operations.

Note, however, that there *are* some (exactly four, as far as I know)
instructions that do require strict alignment, namely the interlocked
queue instructions:

	insqhi
	remqhi
	insqti
	remqti

All of these require that their queue be on a quadword boundary (and,
further, that the relative offsets that make these objects into queues
be multiples of 8 as well).  If the address handed to one of these
instructions is not valid, or if the queue offsets are invalid, you
get a reserved operand fault (again, the bits are not ignored).

Incidentally, these instructions are excruciatingly slow---about 150% of
the time for an integer divide with FPA, or twice as long as a subroutine
call, on an 11/780.

(NB: I have no explanation as to why an interlocked instruction should be
faster on a 750.)

[begin text from an article in net.unix that I saved back in 1983]

The following VAX instruction timings were obtained from a former
DEC employee.  I cannot vouch for their accuracy and have no idea
how they were obtained.

  VAX-11/780 vs. VAX-11/750 vs. VAX-11/730 WITH FPA
  INSTRUCTION			      <EXECUTION TIME MICROSECS> <TIMES 780>
					  780	  750	  730	 750	 730

INTERLOCKED INSERT + REMOVE		 30.43	 26.43	 41.02	1.151	0.742

versus, e.g.,
c
MOVL Reg, Reg				  0.40	  0.93	  1.69	0.430	0.237
MOVL mem, Reg				  0.84	  1.67	  4.94	0.503	0.170
MOVL Reg, mem				  1.31	  2.28	  4.88	0.575	0.268
CMPL AND BLEQ				  1.16	  2.32	  4.26	0.500	0.272
CMPL mem, Reg AND BLEQ			  1.88	  3.24	  7.31	0.580	0.257
TSTL AND BLEQ				  1.00	  2.42	  4.25	0.413	0.235
BRW					  0.80	  2.01	  2.57	0.398	0.311
MULL2 Reg, Reg				  1.85	  5.68	 12.05	0.326	0.154
MULL2 mem, Reg				  2.50	  6.55	 15.14	0.382	0.165
MULL2 Reg, mem				  2.48	  6.41	 15.11	0.387	0.164
DIVL3 Reg, Reg, Reg			  9.64	  8.88	 16.15	1.086	0.597
CALLS #0, ROUTINE + RET			 14.75	 20.87	 36.61	0.707	0.403
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 405 2750)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris