Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10 5/3/83; site utcsrgv.UUCP Path: utzoo!utcsrgv!thomson From: thomson@utcsrgv.UUCP (Brian Thomson) Newsgroups: net.bugs.4bsd Subject: 4.2BSD non-blocking sockets and selects Message-ID: <2980@utcsrgv.UUCP> Date: Tue, 20-Dec-83 18:59:28 EST Article-I.D.: utcsrgv.2980 Posted: Tue Dec 20 18:59:28 1983 Date-Received: Tue, 20-Dec-83 19:35:29 EST Organization: CSRG, University of Toronto Lines: 115 Index: sys/uipc_socket.c h/socketvar.h 4.2BSD Description: If you do a select() for writing on a non-blocking SOCK_STREAM socket, and there is some send queue buffer space available, it will tell you the socket can be written. But sosend() insists that all writes to non-blocking sockets be atomic, and will return EWOULDBLOCK if there is not enough buffer space for the entire write to go in one shot. This behaviour is OK for non-stream sockets, but streams should allow partial writes. A couple of distributed utilities agree with me ... Repeat-by: Both rlogind(1) and telnetd(1) are prepared for partial socket writes. Try this: % rlogin localhost < message of the day > % cat /usr/dict/words ~^Z (i.e. suspend the rlogin locally) Stopped % jobs [1] Stopped rlogin localhost % An iostat at this point will show (unless you happen to exactly fill the send queue) that your system is being eaten alive by rlogind. Fix: Allow partial writes to non-blocking sockets unless the underlying protocol is atomic. This is consistent with the behaviour of non-blocking ttys, which are a good model for stream-oriented sockets. In file /sys/h/socketvar.h, change: #define sosendallatonce(so) \ (((so)->so_state & SS_NBIO) || ((so)->so_proto->pr_flags & PR_ATOMIC)) to #define sosendallatonce(so) \ ((so)->so_proto->pr_flags & PR_ATOMIC) In file /sys/sys/uipc_socket.c, routine sosend(), diff -c shows: *************** *** 281,286 register int space; int len, error = 0, s, dontroute; struct sockbuf sendtempbuf; if (sosendallatonce(so) && uio->uio_resid > so->so_snd.sb_hiwat) return (EMSGSIZE); --- 287,293 ----- register int space; int len, error = 0, s, dontroute; struct sockbuf sendtempbuf; + int sentsome = 0; if (sosendallatonce(so) && uio->uio_resid > so->so_snd.sb_hiwat) return (EMSGSIZE); *************** *** 324,329 goto release; } mp = ⊤ } if (uio->uio_resid == 0) { splx(s); --- 331,337 ----- goto release; } mp = ⊤ + sentsome = 1; } if (uio->uio_resid == 0) { splx(s); *************** *** 336,342 if (space <= 0 || sosendallatonce(so) && space < uio->uio_resid) { if (so->so_state & SS_NBIO) ! snderr(EWOULDBLOCK); sbunlock(&so->so_snd); sbwait(&so->so_snd); splx(s); --- 344,353 ----- if (space <= 0 || sosendallatonce(so) && space < uio->uio_resid) { if (so->so_state & SS_NBIO) ! if(sentsome) ! { splx(s); goto release; } ! else ! snderr(EWOULDBLOCK); sbunlock(&so->so_snd); sbwait(&so->so_snd); splx(s); Reservation: You should probably HOLD OFF installing this change until it gets batted about the net a bit. The original behaviour appears to have been quite deliberate, and although I do think it's wrong, I'd like to give someone in the know a chance to explain the unobvious reason that it was right in the first place! -- Brian Thomson, CSRG Univ. of Toronto {linus,ihnp4,uw-beaver,floyd,utzoo}!utcsrgv!thomson