Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!wuarchive!usc!ucsd!pacbell.com!pacbell!att!dptg!lzga!bogatko From: bogatko@lzga.ATT.COM (George Bogatko) Newsgroups: comp.unix.wizards Subject: Re: Help needed with System V message queues Summary: how they work (long) Keywords: SYSV messagequeue Message-ID: <1990@lzga.ATT.COM> Date: 8 Aug 90 16:36:56 GMT References: <1169@compel.UUCP> Organization: AT&T BL Middletown/Lincroft NJ USA Lines: 579 HI: A while ago, I got sick of wondering what the tunables for messages meant. The result was a writeup which was begun, interrupted, and never finished. This is what remains of it. It is 500+ lines long, so you may want to skip all this if message queues leave you cold. After it, in a second posting is a 'pic' file of how messages are stored in memory. This may answer some questions about message queues, and how they work. If I see enough interest, I may finish the writeup. ************ OVERVIEW In /etc/master.d there is a file, called msg that contains the tunable parameters for the device driver that handles the UNIXTM message queue system. This file contains lines similar to the following: MSGMAP = 100 MSGMAX = 2048 MSGMNB = 4096 MSGMNI = 50 MSGSSZ = 8 MSGTQL = 40 MSGSEG = 1024 The meaning of these parameters, as stated in the manual Operations and Administration Series: Performance Mangement under the chapter Tunable Parameters Definitions is: MSGMAP specifies the size of the memory control map used to manage message segments. If this value is insufficient to handle the message type facilities, a warning message is sent ot the console. MSGMAX specifies in bytes the maximum size of a message sent. When receiving a message, a value larger than this parameter can be used to ensure that the whole message is received and not truncated. MSGMNB specifies the maximum length, in bytes of a message queue. The owner of a facility can lower this value, but only the superuser can raise it. MSGSEG Specifies the number of message segments in the system. MSGSEG * MSGSSZ should be less than 131,072 bytes (128 kilobytes). MSGSSZ specifies the size in bytes of a message segment as stored in memory. Each message is stored in a contiguous message segment. The larger the segments are, the greater the chance of having wasted memory at the end of each message. MSGSSZ * MSGSEG should be less than 131,072 bytes (128 kilobytes). MSGTQL specifies the number of message queue headers on all message queues sytem-wide, and thus, the number of outstanding messages. These values are stored in a structure called "struct msginfo" which looks like this: struct msginfo { int msgmap, msgmax, msgmnb, msgmni, msgssz, msgtql; ushort msgseg; }; This structure is used when the MSG driver is initialized. Notice that the values are stored as INTS. This bites you in the butt later on when the value of 'msgmnb' is put into 'msg_qbytes'. All the structures described here are found in '/usr/include/sys/msg.h' There are four data structures used in the message queue universe. struct msgbuf This is actually a template used by the user to hold the message; it is assumed (incorrectly) that the user knows enough to re-write this template to suit their needs. The structure given in the header file msg.h is never used. The only thing that must be here is the member "long mtype". which must always be the first member. The template looks like this: struct msgbuf { long mtype; /* message type */ char mtext[1]; /* message text */ }; Notice that the size of mtext is one (1). This is not enough size to hold anything useful. Newcomers to UNIXTM messages almost always bark their shins on this one. struct msg This is the structure that points to the address of the actual message in the message pool. It looks like this: struct msg { struct msg *msg_next; long msg_type; short msg_ts; short msg_spot; }; The message queue driver keeps these structures, called message headers in an array, whose size is determined by the tunable parameter MSGTQL. Each outstanding message, i.e. a message that has been sent but not yet received, has one of these headers associated with it. Thus the number of outstanding messages handled by the driver is determined by the setting of MSGTQL. The members of the structure are used as: msg_next Even though these headers are in an array, they are handled like a linked list. This member points to the next header, which is located somewhere in the array. msg_type This corresponds to the long mtype member of the msgbuf structure shown above. msg_ts This is the precise length in bytes of the message that is stored in the message pool. msg_spot This is the location in the message pool of the message. Notice that it is not a pointer. It is really an offset. struct msqid_ds This is the structure that keeps vital statistics about a particular message queue that is being serviced by the driver. It looks like this: struct msqid_ds { struct ipc_perm msg_perm struct msg *msg_first; struct msg *msg_last; ushort msg_cbytes; ushort msg_qnum ushort msg_qbytes; ushort msg_lspid; ushort msg_lrpid; time_t msg_stime; time_t msg_rtime; time_t msg_ctime; }; The driver keeps these structures in an array, whose size is determined by the tunable parameter MSGMNI. Thus the maximum number of message queues handled by the driver is determined by the setting of MSGMNI. The members of the structure are used as: msg_perm This is a structure located in ipc.h that contains various permissions and ids. It looks like this: struct ipc_perm { ushort uid; /* owner's user id */ ushort gid; /* owner's group id */ ushort cuid; /* creator's user id */ ushort cgid; /* creator's group id */ ushort mode; /* access modes */ ushort seq; /* slot usage sequence number */ key_t key; /* key */ }; This structure is used by all the drivers in the IPC system. The meaning of these variables is clear from their names, and the comments; except for seq. This variable holds a sequence number that is used to determine the msqid that is returned from the msgget() call. By constantly incrementing this number, one can be sure that the same message queue header will not have the same msqid returned when it is re-allocated. msg_first This is a pointer to the first struct msg member in the linked list of struct msg message headers. msg_last This is a pointer to the last struct msg member in the linked list of struct msg message headers. msg_cbytes This is the total number of bytes currently on the queue. It represents the accumulated total of all the values of the struct msg member msg_ts in the linked list of outstanding message headers for that particular queue. msg_qnum This is how many messages are outstanding on the queue; and thus how many struct msg message headers are linked to this queue header. msg_qbytes This is an upper limit of how many bytes can be outstanding on the queue. This is an arbitrary value, which is set from the value of the tunable parameter MSGMNB. This means that raising or lowering this value does not alter how many total messages you can have handled by the driver. That is determined by the size of the message pool. This value can be altered by either the owner/creator of the message queue, or the super-user, without having to change the value of MSGMNB. msg_lspid The process id of the last process to send a message. msg_lrpid The process id of the last process to receive a message. msg_stime The last time a message was put on this queue. msg_rtime The last time a message was taken off this queue. msg_ctime The last time anything at all happened regarding this header. The message pool There is no formal name for the pool in the msg.h structure. The message pool is an amorphous blob of memory. It is obtained by a call to the kernel function kseg() which returns the base address of a segment of kernel memory. This address is cast to type paddr_t (physical address type) which on the 3B2 line is a long. It is treated as a contiguous array of single bytes. Think of it as a char array. This size of this blob is determined by multiplying the values in the two tunable parameters MSGSEG and MSGSSZ. The result of the multiplication is then rounded up to the nearest page size. This size is passed as a parameter to kseg(). The argument to kseg() is a request for pages of memory, in the range 1 - 64. 64 pages (128K) is the upper limit of memory that will be returned by kseg(). This is why the tunable parameters guide mentioned above says: "MSGSSZ * MSGSEG should be less than 131,072 bytes (128 kilobytes)." SENDING A MESSAGE Before diving in to how the driver handles messages, it might be best to present a general picture of how outstanding messages are stored. Recall from part 1 that there are four data structures involved in the process: struct msgbuf, struct msg, struct msqid_ds, and the memory pool. Briefly, the msqid_ds structure points to the first instance of a message header, which is a msg structure. Each message header points to the next message header in the linked list of outstanding messages. Each message header contains the offset in the memory pool where the actual message is being stored. The memory pool is logically divided into segments (of size MSGSSZ), and total number of these segments is of size MSGSEG. When a message is actually stored, it will occupy as much space as necessary, rounded up to the nearest segment size. From this it can be seen that unless the message size aligns with the MSGSSZ segment size, there will be some wasted bytes associated with each stored message. The enclosed diagram Mapping a Message Queue ID to a 28 Byte Message in Kernal Memory displays the association of all these data structures in the job of holding a 28 byte message in memory. **** SEE FOLLOWING POSTING FOR PIC FILE **** 3.1 Conversion of a user supplied message queue ID (msqid) to a pointer to a struct msqid_ds queue header This is a fundamental algorighm in the whole process, and is called from many places in the driver; algorithm 'msgconv' convert msqid to msqid_ds structure { 1. pointer = address of array, offset by msqid modulo MSGMNI 2. lock the reference to the structure 3. if(the structure is not in use) || the sequence number doesn't equal the msqid divided by MSGMNI) return EINVAL 4. return the pointer found in step 1. } step 1. convert msqid to pointer Remember that the msqid_ds structures are held in an array of size MSGMNI. Assuming that the call msgget() works correctly (it does), the msqid that is returned from that call will always be the result of an incrementing sequence number (remember struct ipc_perm.ushort seq?) times the value of MSGMNI, plus the offset into the msqid_ds array of the assigned message header. Thus if MSGMNI is 100, the sequence number 3, and the offset 30, the message queue id returned by msgget() would be 330. The line that converts the msqid back to the offset is simply: qp = &msgque[id % msginfo.msgmni]; Which is to say that the offset of the array (here msgque) is msqid modulo MSGMNI. In our example, this would be 30, which is indeed the offset of the array. step 2. Lock the reference to the structure A parallel char array, of size MSGMNI is kept. Once the offset is found, it is used to find the associated lock value in this lock array. As long as this value is 1, the process sleeps (lines 1 and two following). 1 while (*lockp) 2 sleep(lockp, PMSG); 3 *lockp = 1; When another process, which is using this message header, is done, it sets the value of this lock to 0, and issues a wakeup(). Our process then wakes up and now finds the value to be 0. It then stops going to sleep, and locks the value (line 3). This is what allows message queues to act "atomicly" step 3. check the msqid value for validity The value of qp->msg_perm.seq is checked against the value of msqid divided by MSGMNI and if they don't match, then errno is set to EINVAL and the system call returns with an error (-1). In our example, if the msqid is 330, then 330/100 (in integer arithmetic) yields 3, which is indeed the sequence number. Thus the msqid is successfully converted to a valid and active message header. step 4. return the queue pointer 3.2 Sending_a_message Sending a message consists of receiving a buffer from the user, and putting it into the message pool, with proper labeling so that a receive request can copy that message out of the message pool. algorithm 'msgsnd' send a message { 1. convert msqid to msqid_ds pointer (algorighm msgconv) 2. if(access denied by incorrect permissions) return EACCES 3. if(byte count <= 0 || byte count > MSGMAX) return EINVAL 4. copy the message type from the user area return EFAULT on error 5. if(message type <= 0) return EINVAL GETRES: 6. if(queue has been removed or changed) return EIDRM 7. if( (total bytes in queue > MSGMNB) || (no free msg headers available [MSGTQL]) { 8. if(IPC_NOWAIT set) return EAGAIN 9. sleep 10. if(sleep was interrupted) return EINTR 11. goto GETRES } 12. call 'malloc()' to find free slot in message pool. 13. if(no free space in message pool) { if(IPC_NOWAIT set) return EAGAIN sleep if(sleep was interrupted) return EINTR goto GETRES } 14. assuming all is OK, copy from user to message pool. 15. if(system error during copy) { call 'mfree' to mark slot as free return EFAULT } 16. update 'msqid_ds' header 17. initialize 'msg' header 18. link 'msg' header into chain of related 'msg' headers. 19. return 0 } ***************** At this point I was interrupted by real work, and never returned to the writeup. The "Bach Book" has a good writeup on this stuff. A few points however: MSGMAX is not MSGMNB. MSGMAX is the largest message you can send. You can't find out the value of MSGMAX from the msqid_ds structure. MSGMNB can be found out from "msg_qbytes". Recall from the above, that you can reset this to a higher value if you are super-user and always to a lower value if you are the owner. What is important to note however is that this number has nothing to do with capacity of the driver. It is just a number that is compared against. MSGMNB is stored in "struct msginfo" as an INT, but when the driver is initialized, it is transfered to "struct msqid_ds" in member "msg_qbytes" which is a "ushort". Thus while some documentation may say that you can have a high message queue maximum, you can only have the upper limits of a ushort (65535). If you set MSGMNB to a higher value than this, it will wrap, and you will wind up with a lower value. If you want to increase the size of the message pool, increase MSGSEG. NEVER INCREASE MSGSSZ. Recall from above: "The memory pool is logically divided into segments (of size MSGSSZ), and total number of these segments is of size MSGSEG. When a message is actually stored, it will occupy as much space as necessary, rounded up to the nearest segment size. From this it can be seen that unless the message size aligns with the MSGSSZ segment size, there will be some wasted bytes associated with each stored message." This means that if MSGSSZ is 50 bytes, and you have a 10 byte message, you will occupy one slot, and have 40 bytes of wasted storage hanging around. But if you keep the value of 8, then you will occupy 2 slots and have 6 bytes of wasted storage hanging around. See the following posting for a pic file of how this storage works. In the msgsnd description above is: 12. call 'malloc()' to find free slot in message pool. 13. if(no free space in message pool) { if(IPC_NOWAIT set) return EAGAIN sleep if(sleep was interrupted) return EINTR goto GETRES } Notice that there no escape from this if the size of the message you are trying to send is greater then the total amount of memory in the message pool. Thus you can hang forever waiting for enough room to become available. This will be allowed if MSGMNB is set to be greater than ( MSGSSZ * MSGSEG * (sizeof page on your machine (2048 on 3B's) ) ). Be careful when setting the tunables!!! When sending messages, the third parameter must be the size of the actual message, not the size of the 'struct msgbuf', which will be sizeof(long) bytes bigger. If you don't pay attention to this, you will get message trashing. I can't recall now how, when, or why this happens, but I guarantee that it will be mysterious, and usually fatal. struct msgbuf { long mtype; /* message type */ char mtext[1]; /* message text */ }; So the safe way to use 'msgsnd' is: typedef struct { long mtype; struct { char buf[10]; int xxx; double yyy; etc... } mtext; } MSG; MSG message; 1. msgsnd( msqid, &message, sizeof(message.mtext), 0); OR 2. msgsnd( msqid, &message, sizeof(message)-sizeof(long), 0); I prefer #1. I also prefer to send message with the fourth parameter as 0. This will make the message block if there is no room at the time. It is more normal for the message to block in a heavily loaded system then not, so you will probably NOT want to die if you can't send the message because there's no room. If you are worried about deadlock, put in an alarm call so you can time out. When receiving messages, check for EINTR if you get a -1. You will usually want to just wrap around and try again if you get an interrupt (usually from an alarm call, or some other friendly signal). Hope this (long-winded) yakking helps somebody. GB