Path: utzoo!attcan!uunet!cs.utexas.edu!sdd.hp.com!elroy.jpl.nasa.gov!ames!dftsrv!mimsy!chris From: chris@mimsy.umd.edu (Chris Torek) Newsgroups: comp.unix.questions Subject: Re: reference about mbufs Message-ID: <25645@mimsy.umd.edu> Date: 21 Jul 90 19:40:02 GMT References: <1990Jul5.175406.22944@Neon.Stanford.EDU> Organization: U of Maryland, Dept. of Computer Science, Coll. Pk., MD 20742 Lines: 152 In article <1990Jul5.175406.22944@Neon.Stanford.EDU> david@Neon.Stanford.EDU (David M. Alexander) writes: >Does anyone know a book or article that discusses the mbuf structure >and the functions and macros to manipulate them? Hmm... do you mean `4.1BSD BBNNET mbufs', `4.2BSD mbufs', `4.3BSD mbufs', `4.3BSD-tahoe mbufs', or `4.3BSD-reno mbufs'? They are all different (and probably differ from various Ultrix mbufs and maybe also 4.1a, b, and c mbufs, but I never saw 4.1[abc]; now that SunOS has STREAMS one would hope the kernel group settled on one kind of memory allocator as well...). 4.3-tahoe mbufs are probably the simplest to easily explain: struct mbuf { struct mbuf *m_next; /* next buffer in chain */ This links together mbufs that make up one (packet/group/whatever) so that the amount of data in a data-chunk can be bigger than the maximum size of a single mbuf. u_long m_off; /* offset of data */ This gives the offset from the base of the mbuf (the address of the entire `struct mbuf') to the data. For `normal' mbufs the data are somewhere in m_dat[]. For `big' mbufs (`mclusters') the data are in a separate `page' (typically 1Kbyte, i.e., not necessarily a hardware page) and the offset is large (>= sizeof(struct mbuf)). short m_len; /* amount of data in this mbuf */ Thus the length of a complete packet is the sum of the lengths of all the mbufs found via m_next's. short m_type; /* mbuf type (0 == free) */ One of the magic type codes. u_char m_dat[MLEN]; /* data storage */ Up to 112 bytes of data. struct mbuf *m_act; /* link in higher-level mbuf list */ Various uses. Mainly for datagram protocols: several packets are linked together via m_act pointers. Conceptually, following m_next pointers `assembles' each packet, while following m_act pointers `lists' each packet. The m_act pointers are set only in the `top' mbufs: -------- socket buffer: so->so_sb.sb_rcv -------- | sb_mb v +-------+ m_act +-------+ m_act +-------+ m_act | pkt 1 |------>| pkt 2 |------>| pkt 3 |--->nil +-------+ +-------+ +-------+ | m_next | m_next | m_next v v v +-------+ +-------+ +-------+ | | | | | | +-------+ +-------+ +-------+ | m_next | m_next | m_next +-------+ nil +-------+ | | | | +-------+ +-------+ | m_next | m_next nil nil }; functions/macros: MGET(m, waitflag, type) sets `m' to point to a new mbuf of type `type'. waitflag is either M_DONTWAIT (if cannot sleep; then m may be set to nil) or M_WAIT (if can sleep; then m will never be nil). M_CLALLOC(m, i) Gets `i' mclusters (i must be 1). Never waits; sets m to nil if there are none. M_HASCL(m) True iff m is an mcluster rather than a regular (tiny) mbuf. MTOCL(m) Gets base of cluster page given an mcluster. MCLGET(m) Changes m from a regular mbuf to an mcluster, if there is space. If not, leaves m a regular mbuf. m->m_len is set to MCLBYTES on success, or MLEN on failure (so, e.g., `M_HASCL' will tell whether it succeeded). MCLFREE(m) Puts m on the mcluster free list. MFREE(m, n) Puts m on the free list; sets n to what m->m_next used to be. To free a chain you could use while (m) { MFREE(m, n); m = n; } Automatically knows when to use MCLFREE. struct mbuf *m_get(int waitflag, int type); Returns a new mbuf, exactly like MGET except incurring a function call and using less space. struct mbuf *m_getclr(int waitflag, int type); Returns a new mbuf like m_get, but zeroes out all the data. struct mbuf *m_free(struct mbuf *m); Puts m on the free list like MFREE; returns the old m->m_next. struct mbuf *m_more(int waitflag, int type); Internal use (for MGET). struct mbuf *m_copy(struct mbuf *m, int off, int len); Copies the data from the mbuf chain headed by `m' into new mbufs (so that it can be modified without affecting other users of the same data). Skips the first `off' bytes of data; copies at most `len' bytes. Thus, to copy no more than 32 bytes from the chain headed by `m', after skipping over the first 4 bytes, use mcopy = m_copy(m, 4, 32); A `len' of M_COPYALL means `copy until end of chain'. struct mbuf *m_pullup(struct mbuf *m, int len); `Pulls' a minimum of `len' bytes of data into the first mbuf in the chain, possibly replacing the chain (as if via m_copy) in the process. Used to force entire packet headers into a single mbuf. mtod(m, type) Gives (as type `type') the address of the first byte of data in m. Used as, e.g., m = m_pullup(m, sizeof(struct ip)); /* get entire IP header */ struct ip *ip_header = mtod(m, struct ip *); dtom(pointer) Turns an arbitrary data pointer into the corresponding mbuf (via trickery). dtom() might someday go away. Something important not mentioned above: packets received from an interface are put on the appropriate protocol's input queue with the first mbuf containing a pointer to the `ifnet' structure as its first item. That is, after receiving an IP packet, an Ethernet driver puts an mbuf chain onto `ipintrq' that looks like: offset 0: *mtod(m, struct ifnet **) points back to Ethernet I/F offset sizeof(struct ifnet *): IP header, followed by data The `IF_DEQUEUEIF' macro handles this little idiosyncracy. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@cs.umd.edu Path: uunet!mimsy!chris