Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!linus!philabs!cmcl2!seismo!brl-smoke!smoke!cck%cucca.columbia.arpa@BRL.ARPA From: cck%cucca.columbia.arpa@BRL.ARPA (Charlie C. Kim) Newsgroups: net.unix-wizards Subject: MBUF problems Message-ID: <994@brl-smoke.ARPA> Date: Mon, 17-Feb-86 13:58:41 EST Article-I.D.: brl-smok.994 Posted: Mon Feb 17 13:58:41 1986 Date-Received: Wed, 19-Feb-86 20:26:21 EST Sender: news@brl-smoke.ARPA Lines: 56 SYSTEM type: VAX + DEQNA OPERATING System: 4.2BSD and Ultrix 1.1 GENERAL area: MBUF handling. PROBLEM: panic: trap type 9 (Protection fault) in ip_output. DIAGNOSIS: ip_output is called with a bogus mbuf by ip_forward. ANALYSIS: This results because a basic assumption of the dtom macro is violated. dtom assumes that mbufs are aligned on 128 byte boundaries and that all the data is contained in that 128 bytes. For most mbuf this is true, and it is valid to simply mask off the low order bits. Unfortunately, for mbufs whose data is in the page pool, the data is not in the same 128 bytes. (As a matter of fact, the data pointer points to some virtual address behind the mbuf.) On a vax, this would not be seen very often since pages in the pool generally don't get used until the data size exceeds CLBYTES (CLSIZE*NBPG) (1024). Another mitigating factor is the fact that the private page pool is only used "when copying data from a user process into the kernel, and when bringing data in at the hardware level". In particular, the Ultrix 1.1 DEQNA driver uses the private page pool. (The DEUNA, and other devices may also be affected). CURE: I'm not sure there is an easy cure, but I've outlined in what I think is best to worst solution. Hopefully, this is fixed in 4.3BSD or Ultrix 1.2. 1. Fix dtom. This may not be easy. Though we can check whether the data pointer is in the data pool or in the mbuf space, it may be difficult to trace back pointers in the data pool. Basically, the problem arises from the fact that copies are made by duplicating the page entries instead of copying the data. To handle traceback, we would need some page table which told us which mbuf a page was associated with; if more than one mbuf could be associated with a data page, then things would be a real hassle. 2. Drop usage of dtom where necessary. This would require careful rewrites of portions of the code. This could be done by dropping all usages of dtom or by tracing back where dtom could be used. Some of this is easy, but some nontrivial rewriting would definitely be necessary. For example, in in_input, the ip reassembly code would have to be reworked. 3. Drop usage of the private page pool. This is undesirable; though there should be no reason why it wouldn't work. Charlie C. Kim User Services Center for Computing Activities Columbia University New York, NY 10025