Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.1 6/24/83; site fortune.UUCP Path: utzoo!watmath!clyde!burl!ulysses!mhuxl!ihnp4!fortune!rpw3 From: rpw3@fortune.UUCP Newsgroups: net.arch Subject: Re: byte alignment - (nf) Message-ID: <3232@fortune.UUCP> Date: Fri, 4-May-84 07:02:20 EDT Article-I.D.: fortune.3232 Posted: Fri May 4 07:02:20 1984 Date-Received: Sat, 5-May-84 01:33:08 EDT Sender: notes@fortune.UUCP Organization: Fortune Systems, Redwood City, CA Lines: 57 #R:wateng:-96500:fortune:16500013:000:2796 fortune!rpw3 May 4 01:35:00 1984 Speaking of "Stretch" and "bytes", don't forget the venerable PDP-10 and DECsystem-20 have the same sort of "bytes" -- 1 to 36 bits wide, specified in the "byte pointer". (I suspect the idea was copied from the IBM predecessors, since a PDP-10 looks a lot like a cleaned-up IBM 7094.) The nice thing about PDP-10/20 byte pointers is that they can directly implement character set mapping via "byte strips". If you partition the ASCII set into (say) 3 sets (for example, letters/numbers, delimiters, and "ignore"), you can build a byte pointer that points to a two-bit wide strip in a 128-word array (or 129 or 256 or 257, depending on taste). This is a generalization of the trick used in the 'stdio' array "ctype", but "ctype" uses only single-bit strips. (See CACM, long ago, Jim Gimpell (sp?) of Bell Labs, "Minimization of Spatially-Multiplexed Character Sets", which talked about the bit-strips [NOT byte-strips!] used in the SNOBOL-IV lexical analyzer.) To make it clearer (?), suppose the character array is called CTYPE (why not?), and the set we are interested is called LETDEL (6-char limit, on the -10), and it occupies bits 31 and 32 (the -10 is a "big Endian" machine, so bit 0 is MSB and bit 35 is LSB), and suppose we always have the character to be tested in accumulater (register) "C" (programmer-defined symbol), then the code to do a three-way branch based of which subset the character belongs to is the following two instructions (I have used UNIX-like syntax for non-"10" readers): +-- | ldb t1,letdel ; get two-bit field into register t1 | jrst @.+1(t1) ; dispatch via jump-table | .word is_ign ; C is in the "ignore" set, go to is_ign | .word is_let ; C is in the "letters & numbers" set | .word is_del ; C is in the "delimeters" set | .word panic ; can never happen if we built "ctype" right! | |letdel: point 2,ctype(C),32 ; same as 32<<30 + 2<<24 + C<<18 + ctype +-- Using such byte strips (which are only marginally less efficient even on hardware without variable-sized bytes), VERY FAST lexical analyzers can be built, fast enough that character-at-a-time interpretation of interactive languages (such as FOCAL or *shudder* BASIC) is not unreasonable. In a port of FOCAL to the PDP-10 which used this, the parsing was NOT the critical time item (symbol table search was). This is a case in which efficient HARDWARE architecture on an early machine can suggest efficient SOFTWARE on later machines, even though they may not have the special hardware of the earlier machine. For example, the "bit-blt" hardware of the Xerox "Alto" and the software of the Apple "Mac". Rob Warnock UUCP: {ihnp4,ucbvax!amd70,hpda,harpo,sri-unix,allegra}!fortune!rpw3 DDD: (415)595-8444 USPS: Fortune Systems Corp, 101 Twin Dolphin Drive, Redwood City, CA 94065