Xref: utzoo comp.sys.intel:1458 comp.arch:19004
Path: utzoo!attcan!uunet!cs.utexas.edu!sun-barr!newstop!sun!amdcad!mozart.amd.com!neutron!david
From: david@neutron.amd.com (David Witt)
Newsgroups: comp.sys.intel,comp.arch
Subject: Re: Intel bugs / bugged by Intel :-(
Message-ID: <1990Nov5.172110.15994@mozart.amd.com>
Date: 5 Nov 90 17:21:10 GMT
References: <JSP.90Oct25155458@glia.u.washington.edu> <35431@cup.portal.com> <73898@sgi.sgi.com>
Sender: usenet@mozart.amd.com (Usenet News)
Reply-To: david@neutron.AMD.COM (David Witt)
Organization: Advanced Micro Devices, Inc., Austin, Texas
Lines: 56

In article <73898@sgi.sgi.com> vjs@rhyolite.wpd.sgi.com (Vernon Schryver) writes:
>Hah!   --expletives deleted--
>
>My personal experience with AMD in the last 18 months has been that if I
>find 1 bug and my AMD FAE and I are persistent about it, then AMD will
>eventually deign to confirm the problem exists and to reward me with one or
>two more bugs in an old bug list.  For one 29K characteristic (it won't be
>changed so it is not a bug), it took enormous effort for months from a lot
>of people to get AMD to admit that concern was warrented and to get the
>easy work-around.  I have had similar recent and ancient experiences with
>INTEL.  I don't recall any difference between the two companies in this
>regard.


Whatever it is worth, I was involved in this particular episode referred
to by the gentleman at SGI, so for all you microprocessor users out there
here is another perspective.

The 29K characteristic I believe he was referring to was a non-deterministic
failure, ie on large programs a board that SGI was buying by a 3rd party
was crashing their system during a particular communication protocol.

When I got involved in it, (which is typically when everyone else has looked
at it and given up), we eventually ran this down to the 3rd party was
reading a byte control field from an FDDI note controller into the 29K,
and floating the rest of that data bus, ie a word transfer with only
8 bits defined.

This was having some downstream nasty effects when the undriven values
on the data bus would decay and be read an manipulated inside the 29K,
resulting in non-deterministic failures.  When the outside source was
told to execute a byte load instead of a word load, the problem was
eliminated, although as you can imagine this took a fairly long time
to isolate.  The final result back to the end customer apparently was
that AMD was slow in identifying this errata, and that it was not a
problem in the chip (ie reading unstable data on word transfers), ie that
we were trying to pass the problem off on someone else.

Often problems are quite difficult to isolate such as this, and also quite
often the problem turns out to be the customers use of the chip, which
takes just as long to isolate as all the real errata that must be documented,
work around identified, and fixed in the next revision.

Just a little sympathy sometimes guys, we all are in this together.
>
>I think the problem is not a matter corporate nastiness, but of the
>familiar human reluctance to admit errors.  People who find problems with
>my code might say nasty things about my enthusiasm for agreeing with them.
>
>
>Vernon Schryver,    vjs@sgi.com

	David Witt		1-(512)-462-5846
        Advanced Processor Development
	Advanced Micro Devices	domainLand:  david@neutron.AMD.COM
	Austin, Texas