Path: utzoo!attcan!uunet!lll-winken!csd4.milw.wisc.edu!mailrus!tut.cis.ohio-state.edu!ucbvax!BRL.MIL!mike
From: mike@BRL.MIL (Mike Muuss)
Newsgroups: comp.sys.sgi
Subject: Re:  lrectwrite & gsync?
Message-ID: <8903032345.aa14962@SEM.BRL.MIL>
Date: 4 Mar 89 04:45:03 GMT
Sender: daemon@ucbvax.BERKELEY.EDU
Organization: The Internet
Lines: 96

Mark -

Thanks for your detailed and informative note.  From what you say,
then the 60% SYS time must be DMA setup, and the 40% IDLE time must
be the actual DMA transmission time. I was seeing 1000
scanlines/second.  If that translates to 1000 syscalls and interupts per
second, then I can understand the significant overhead that I was
encountering.

I guess I would like the opportunity to vary the pipe_write / DMA
crossover point in my application, to see if I can produce faster
screen updates.  The SGI evaluation to set the threshold may not have
taken the system overhead fully into account.

THE BIG PICTURE

Let me also take this opportunity to tell you what I need to do;
perhaps you can suggest some different strategy that may achieve higher
performance.  I have a shared memory segment that is organized as
1024 scanlines of 1280 pixels of 4 bytes each (SGI AlphaBGR format
for lrectwrite).  The arrangement of this data must be fixed,
regardless of what sub-rectangle of it is presently of interest.
If it would help any, I can change the internal organization any
way I like, subject to the previous constraint.

When the application is using the full screen, then this entire array is
written with a single call to lrectwrite(), with delightfully good
performance.  When the application is using a smaller window, it
presently drops back to a loop which calls lrectwrite() once per
scanline.  Here is the actual code fragment:

	/* Simplest case, nothing fancy */
	y = ybase;
	if( !sw_zoom && !sw_cmap )  {
		if( ifp->if_width == SGI(ifp)->mi_memwidth )  {
			/* This one is very fast */
			lrectwrite(
				SGI(ifp)->mi_xoff+0,
				SGI(ifp)->mi_yoff+y,
				SGI(ifp)->mi_xoff+0+ifp->if_width-1,
				SGI(ifp)->mi_yoff+y+nlines-1,
				&ifp->if_mem[(y*SGI(ifp)->mi_memwidth)*
				    sizeof(struct sgi_pixel)] );
			return;
		}
		for( n=nlines; n>0; n--, y++ )  {
			lrectwrite(
				SGI(ifp)->mi_xoff+0,
				SGI(ifp)->mi_yoff+y,
				SGI(ifp)->mi_xoff+0+ifp->if_width-1,
				SGI(ifp)->mi_yoff+y,
				&ifp->if_mem[(y*SGI(ifp)->mi_memwidth)*
				    sizeof(struct sgi_pixel)] );
			/*  XXX big performance hit here.
			 *  GTX is limited to about 1000 lrectwrites/sec,
			 *  due to some library synchronization mechanism
			 *  that burns 60% of the CPU in sys-time. ?!?!
			 */
		}
		return;
	}

So, what I really want to do is write a RECTANGLE from my buffer
to a RECTANGLE on the screen, more in the style of rectcopy().
Does the 4D architecture offer me a way of doing this?
I can imagine several possibilities:

1)  a subroutine, perhaps: 
	lrectwriterect( x1,y1, x2, y2, pixel_p, mem_width, mem_skip )
which would use mem_width pixels, then skip mem_skip pixels, and repeat.
This would be perfect.

2)  A subroutine modeled on the Berkeley writev() call that would take
an array of structures roughly like this (any reasonable layout is fine
with me):
	struct fast_pixel_cmds {
		int		xscr_base;
		int		yscr_base;
		struct sgi_pixel *pixel_p;
		int		count;
	} array[MAX_CMDS];

	fast_pixel_write_v( &array[0], cmd_count );
		
3)  A "vector" version of lrectwrite() that looked something like this:
	struct lrectwrite_vector {
		int		xscr_base, yscr_base;
		int		xscr_max, yscr_max;
		struct sgi_pixel *pixel_p;
	} array[MAX_CMDS];

	lrectwrite_v( &array[0], cmd_count );

Any suggestion at all that you might have will be greatly appreciated!
	Thanks,
	 -Mike