Path: utzoo!utgpu!water!watmath!clyde!rutgers!sdcsvax!ucbvax!VENUS.YCC.YALE.EDU!LEICHTER From: LEICHTER@VENUS.YCC.YALE.EDU ("Jerry Leichter ", LEICHTER-JERRY@CS.YALE.EDU) Newsgroups: comp.os.vms Subject: re: 4.6 lat and decnet problems? Message-ID: <8801202342.AA13631@ucbvax.Berkeley.EDU> Date: 19 Jan 88 15:28:00 GMT Sender: daemon@ucbvax.BERKELEY.EDU Organization: The ARPA Internet Lines: 43 I can't comment on the other problems you are seeing, but... Transfers of large files (eg: 4600-block save-set) over our async decnet lines fail with the messages: "RMS-F-BUG_DAP, Data Access Protocol error detected; DAP code = 00019008" and "RMS-E-CRC, network DAP level CRC check failed". The messages and recovery book says to SPR those errors. The transfer works fine when I force routing through a common node connected to us via a dmr32 synch line. The other systems involved were uVMS 4.3 and 4.5. Possible hint: the line is connected through a port on a DHU emulator, and thus goes through the resurrected 4.4 YFdriver. BUG_DAP could be a genuine bug, but the occurence of DAP-level CRC errors opens up the possibility of hardware problems. A little background: There are checksums computed on the data as it enters each DDCMP link, and as it exits the link at the other end. There are similar checksums on stuff as it is placed on and removed from the Ethernet. A fail- ure of a checksum at this level is invisible to higher levels - the failed packet is simply sent again later. The result is that a DDCMP or Ethernet link will appear, to higher levels of the protocols, as an error-free path. In theory, that's all there is to it - but DAP is extra careful: The sender computes a checksum of all the data it sends (over a theoretically error-free channel) and the receiver checks it. This "end-to-end" checksum covers the ENTIRE DAP trans- action, so a DAP level CRC check failure can only occur as the connection is closing down. So, why would a DAP connection of "error-free" channels sometimes find errors? You have to examine very closely what the checksums are really covering. Con- sider the case of the Ethernet checksum: Data is pulled from memory and hand- ed to the Ethernet controller. It computes a checksum, and sends the packet. The receiving controller pulls the data off of Ether, checks the checksum, and writes the results to memory. The checksum covers the transfer of the data on the Ethernet - it CANNOT say anything about the transfers between memory and the Ethernet controllers. In practice, the most common bad link not covered by a checksum is the memory- to-Unibus-to-DMC or -DMR path. Usually the cause is power supply problems, especially just exceeding the rating of the power supply for the Unibus. Machines that do a lot of routing are often configured with a Unibus contain- ing little besides a couple of DMR's, so the problem may never show up in any other way. -- Jerry