Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!samsung!munnari.oz.au!goanna!ok From: ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) Newsgroups: comp.arch Subject: Re: End-of-buffer interrupt instruction (long) Message-ID: <3747@goanna.cs.rmit.oz.au> Date: 13 Sep 90 04:48:35 GMT References: <2516@l.cc.purdue.edu> <6838.26e7f109@vax1.tcd.ie> <2123@key.COM> Organization: Comp Sci, RMIT, Melbourne, Australia Lines: 62 In article <2123@key.COM>, sjc@key.COM (Steve Correll) writes: [about Herman Rubin's proposed support for reading from a buffer] > I decided to perform a couple of crude experiments. First, to figure out how > long it takes to visit the Unix kernel, activate a user signal handler, and > return: Why should Herman Rubin's proposed operation involve the UNIX kernel at all? He never asked for that! Imagine a scheme where we have struct BufHdr { unsigned char *next; unsigned int count; void (*refill)(struct BufHdr *); ... }; and the operation is unsigned char fetch(struct BufHdr *b) { while (b->count == 0) (b->refill)(b); b->count--; return *(b->next++); } Now, if this is done in hardware: -- fetching the character and testing the count can proceed in parallel. -- decrementing the count and incrementing the pointer can proceed in parallel with the *next* instruction in the rest of the program. So the time that the program would spend waiting for the FETCH instruction would be the time it would spend for two memory references, FETCH r0, r1 on a hypothetical extended NS32532 would take the same time as MOVB 0(0(sp)), r1 Yes, more work would be done, but it could be done in parallel. When the instruction had to trap, it would *not* have to involve the kernel in signal handling, it would just do an ordinary procedure call. This is a pretty substantial speedup for this operation. The Xerox Lisp machines had micro-coded "get byte from buffer" and "put byte into buffer" operations. It paid off handsomely, on that particular architecture, with that particular application mix. With the advent of 16-bit character sets, and in particular of 16-bit character sets encoded in 8-bit streams (see the introduction in the SVID, or the relevant manual pages on a Sony NEWS machine), the cost of fetching a byte becomes less significant, as one then has to check whether this byte needs to be combined with the next. At the moment, there are several ways of doing this coding in actual use, so there isn't one right one that could be built into an instruction. My own belief is that a getc() instruction is not warranted. But it deserves a fairer evaluation than it got. -- Heuer's Law: Any feature is a bug unless it can be turned off.