Path: utzoo!attcan!uunet!mcvax!ukc!stl!stc!idec!camcon!anc From: anc@camcon.uucp (Adrian Cockcroft) Newsgroups: comp.arch Subject: Re: RISC bashing at USENIX (really RISCs as X servers) Summary: Transputer makes a good X server? Message-ID: <1681@gofast.camcon.uucp> Date: 19 Jul 88 17:10:28 GMT References: <6965@ico.ISC.COM> <936@garth.UUCP> <202@baka.stan.UUCP> Organization: Cambridge Consultants Ltd., Cambridge, UK Lines: 67 In article <202@baka.stan.UUCP>, landru@stan.UUCP (Mike Rosenlof) writes: > > When I first brought up X on our color sun 4/260.... > ... I was amazed that the X server performance for simple things > like scrolling and moving windows around was no better. This was just how ... > The loop which does most of the work for a bit blt looks like this for the > common copy case: > > register long count; > register long *src, *dst; > > while( --count ) > { > *dst++ = *src++; > } ...... > according to the 68020 users manual, this loop takes 10 clocks in the > best case and 15 clocks in its cache case. With a 40 nsec clock, this > is 400 and 600 nsec per loop. Are you using the DBRA instruction for this? Has anyone ever seen a compiler generate a DBRA? Maybe SUNs bcopy library routine is in assembler and uses it. > the sun SPARC compiler after optimizing, produces: .... > which takes 9 clocks, and with a 60 nsec clock, this is 540 nsec. > > My point is that with a reduced instruction set, you're very likely to > find some applications that are slowed down by this reduction. In this > case, I find that the sun 4/260 makes a very nice compile or compute > server, but it's not a very impressive X server. The Inmos Transputer has a RISC core with microcode added to speed up things that compilers can use and to put operating system primitives in microcode. One of its useful extras is a block move instruction that moves words as fast as memory bandwidth will allow. ldl src ;load local onto register stack ldl dst ldc count ;load constant move ;blast those RAM chips The move will take 100 ns per word for on-chip src and dst or 300ns per word for off-chip src and dst. The compiler I have (Pentasoft C) can be told to watch out for strcpy(s,"string constant") where it knows the length of src and also uses move for bcopy and structure assignment. A 'wcopy' routine or macro would be needed to get the above code: #define wcopy(src,dst,count) __ABCregs(count,dst,src);asm(" move") would do the trick with Pentasoft C. For bitblt the T800 also has a 2 dimensional block move instruction. Inmos's attitude is that the RISC core made enough space on the chip for RAM and interprocessor links but as the chip shrinks they are adding more microcode space and taking common code sequences into microcode for better performance on certain applications. If this is the hardest work for an X server then Transputers should be pretty good. X is currently being ported to the Transputer by a team at the University of Kent. The Atari Abaq (T800 based) will have X as standard but it probably uses its superfast blitter chip rather than the T800. -- | Adrian Cockcroft anc@camcon.uucp ..!uunet!mcvax!ukc!camcon!anc -[T]- Cambridge Consultants Ltd, Science Park, Cambridge CB4 4DW, | England, UK (0223) 358855 (You are in a maze of twisty little C004's, all alike...)