Path: utzoo!attcan!uunet!wuarchive!cs.utexas.edu!utastro!bigtex!james From: james@bigtex.cactus.org (James Van Artsdalen) Newsgroups: comp.sys.ibm.pc.misc Subject: Re: Timing the CPU and bus size Keywords: 386 386sx, 8088 and 8086 etc Message-ID: <47719@bigtex.cactus.org> Date: 24 Sep 90 17:54:06 GMT References: <1990Sep20.231706.27009@Pacesetter.COM> Reply-To: james@bigtex.cactus.org (James Van Artsdalen) Organization: Institute of Applied Cosmology, Austin TX Lines: 726 In <1990Sep20.231706.27009@Pacesetter.COM>, torkil@Pacesetter.COM wrote: > Everything can be timed, including CPU speed, memory speed and prefetch > queue. Well, as a matter of theory, yes, but I haven't seen it done in practice yet. The Dell 325D, for example, has a cache, that has a write buffer, page mode DRAM, which is interleaved, and refreshed. All of these are hard to account for. A 486 is a bit harder yet because you have several write buffers. Other effects: cache line size, cache line fill order, interleave bit, page size... The measurable parameters are: cache hit, cache miss page hit, cache miss page miss, page size, interleave bit, RAS precharge on the "other" interleaved SIMM on a page miss, refresh pulse width, refresh frequency, and a bunch of other things I'm sure - and we haven't covered any cycle that misses the system board RAM. > The trick is to do incremental measurements. Suppose you have written > the routine that measures the system timer's low bits (in Turbo C it is > outportb(0x43,0x00); low byte = inportb(0x40); high byte = inportb(0x40); - > but you have to write it in assembler to really know what you get). > Each bit increment corresponds to about 1 microsecond (actually 1/1.19318) > but don't be surprised if you never see any odd bit count. It counts > down, by the way. If you don't see an odd bit count, there is a bug - you're probably seeing two microseconds per tick or worse. It is very hard to write a microsecond timer route that in fact works. Even when you have one, you have to account for external variables such as refresh. It is very important that such a routine be validated, or else you'll get another magazine-quality benchmark that can't return the same answer twice. Below is a routine I've derived experimentally. It returns a 32 bit value. It does not work near midnight. There's also the test program. Don't even think of modifying this without running the test program for a few hours. There are several different tests: the desired one is selected by choosing which #elif to enable (yes, I know it's gross but the file grew that way). All of the assembly routines after gettime() are just test helpers. On my Dell 386/16, this routine is consistent within 1 microtick after allowing IRQ 0. You probably won't get quite this good a result, since the 386/16 is unique in having true zero wait state RAM (no DRAM in system). If you use it, you have to allow for IRQ 0 and midnight, and make sure that the results are repeatable (nontrivial with CGA) page 59,130 .386p DEBUG equ 1 PIC0 equ 20h ;8259 Interrupt controller 0 PIC0MASK equ 21h ;8259 Interrupt controller 0 mask TIMER equ 40h ;8254 timer counter 0 address MAGIC equ 80h ;Magic stone - used for 1us delays PIC1 equ 0a0h ;8259 Interrupt controller 1 PIC1MASK equ 0a1h ;8259 Interrupt controller 1 mask WAFORIO macro out MAGIC,al endm A_BLOCK segment use16 at 0a000h public _vga _vga equ $ A_BLOCK ends B_BLOCK segment use16 at 0b000h public _monochrome, _cga _monochrome db 80 * 25 * 2 dup (?) org 8000h _cga db 80 * 25 * 2 dup (?) B_BLOCK ends ROMDAT segment use16 at 40h org 06ch public _bios_timer, _bios_timer_low, _bios_timer_high public _bios_timer_overflow _bios_timer equ $ _bios_timer_low dw ? _bios_timer_high dw ? _bios_timer_overflow db ? ROMDAT ends _TEXT SEGMENT use16 WORD PUBLIC 'CODE' _TEXT ENDS _DATA SEGMENT use16 WORD PUBLIC 'DATA' _DATA ENDS CONST SEGMENT use16 WORD PUBLIC 'CONST' CONST ENDS _BSS SEGMENT use16 WORD PUBLIC 'BSS' extrn _pending_int:word extrn _first_read:word extrn _second_read:word extrn _first_timer:byte extrn _second_timer:byte _BSS ENDS DGROUP GROUP CONST, _BSS, _DATA ASSUME CS: _TEXT, DS: DGROUP, SS: DGROUP _TEXT segment ;------------------------------------------------------------------------------ ; Strategy: ; ; Read the 8254 timer/counter. Read the 17th bit also. Concatinate ; with the BIOS timer tick count. Since the 8254 counts down, but the ; BIOS counts up, invert all 17 bits from the 8254 so that it counts ; up too. ; ; There are two special cases. The first is when the timer over ticked ; "recently". The interrupt to update the BIOS count may or may not ; have occurred. So if the timer wrapped recently, check to see if ; there is a pending interrupt. If so, the BIOS count was not updated, ; so update it "manually". If the 8254 didn't tick "recently", don't ; update the BIOS counter. ; ; The other special case is when the 8254 returns exactly 0. It is ; apparently not possible to determine if the interrupt has or hasn't ; occurred, or if the 17th bit is correct. So, the 8254 must be read ; a second time (we know that this second time will not return 0 ; obviously, and so will not be the same special case again). The ; second 8254 read returns a "wrapped" or "correct" value. If the ; "wrapped" value has the same BIOS count as the "0" read did, then ; the "0" read didn't get the right 17th bit. ; ; A couple of special attributes to this code. The routine always ; takes exactly the same amount of time to run. There is no ; variability in calling gettime(): it is guaranteed to be constant ; time. This is important if a benchmark calls it often. Second, the ; same read from the 8254 is returned each time. The second read ; decides how to update the first read, but the 8254 value read the ; second time does not replace the first value (the second will be ; several counts later than the first. This limits or eliminates any ; variability in terms of when within gettime() the 8254 is read: it ; is guaranteed that if gettime() is called N times, all N calls will ; measure the same duration, within two microseconds (subject to ; external interference). ; ; When using or modifying this, remember that accuracy here isn't the ; last word. Memory refresh, DMA cycles & BIOS timer INT overhead are ; substantial compared to the accuracy of gettime(). gettime() does ; not enable interrupts if they were previously disabled, so an ; application may disable interrupts (and take care of the BIOS timer ; itself). ; ; Handle the BIOS midnight timer count reset!!! public _gettime _gettime proc near pushf push bx mov al,0c2h ; latch counter value cli out 43h,al WAFORIO in al,TIMER ; OUT pin status ifdef DEBUG mov _first_timer,al endif add al,al ; Save OUT in carry flag WAFORIO in al,TIMER ; low order timer count mov ah,al WAFORIO in al,TIMER ; high order timer count xchg ah,al not ax ifdef DEBUG mov _first_read,ax endif mov dx,ax ; Save "raw" count read (inverted) cmc ; Put !OUT in AX:15 rcr ax,1 mov bx,ax ; Save the "official" count cmp dx,-1 ; CL == 0 iff the timer is ticking sete cl ; over right now rol bx,cl ; If ticking over, will need BX ; rotated later WAFORIO ; These seem important WAFORIO WAFORIO WAFORIO mov al,0c2h ; latch counter value out 43h,al WAFORIO in al,TIMER ; OUT pin status ifdef DEBUG mov _second_timer,al endif add al,al ; Save OUT in carry flag WAFORIO in al,TIMER ; low order timer count ifdef DEBUG mov ah,al endif WAFORIO in al,TIMER ; high order timer count ifdef DEBUG xchg ah,al not ax mov _second_read,ax endif cmc rcr bx,cl ; Put carry flag (second !OUT) into ; BX:15. BX already shifted in ; this case. shl cl,4 ; 16 (if timer ticked over) or 0 ifdef DEBUG mov byte ptr _pending_int+1,cl endif shld dx,bx,1 ; If the "raw" value read was exactly add bx,bx shl bx,cl ; ffffh, then clear BX:[0-14] shrd bx,dx,1 ; OK after here mov ax,bx ; If BX == ffffh or BX < 3fffh inc ax ; then if there is a pending interrupt and ax,7fffh ; add it to what the BIOS count is cmp ax,4000h ; since BIOS will update its count setae cl ; as soon as we do the POPF shl cl,4 ; 16 (BX in range) or 0 mov al,0ah out PIC0,al in al,PIC0 and ax,1 ; See if IRQ 0 is requesting mov byte ptr _pending_int,al shl ax,cl ; 1 (if BX in range) or 0 mov dx,ax mov cx,es mov ax,ROMDAT mov es,ax assume es:ROMDAT add dx,_bios_timer_low mov es,cx assume es:nothing mov ax,bx pop bx popf ;Potentially enable interrupts ret _gettime endp ;------------------------------------------------------------------------------ public _poke_bytes _poke_bytes proc near push bp mov bp,sp push di push es mov dx,10[bp] ;Count mov al,12[bp] ;Byte to write poke_bytes_loop: les di,4[bp] mov cx,8[bp] rep stosb dec dx jnz poke_bytes_loop pop es pop di pop bp ret _poke_bytes endp public _poke_words _poke_words proc near push bp mov bp,sp push di push es mov dx,10[bp] ;Count mov ax,12[bp] ;Byte to write poke_words_loop: les di,4[bp] mov cx,8[bp] rep stosw dec dx jnz poke_words_loop pop es pop di pop bp ret _poke_words endp public _peek_bytes _peek_bytes proc near push bp mov bp,sp push di push es mov dx,10[bp] ;Count mov al,12[bp] ;Byte to read peek_bytes_loop: les di,4[bp] mov cx,8[bp] repne scasb je peek_bytes_failed dec dx jnz peek_bytes_loop mov ax,0 ;No error peek_bytes_exit: pop es pop di pop bp ret peek_bytes_failed: mov ax,1 jmp peek_bytes_exit _peek_bytes endp public _peek_words _peek_words proc near push bp mov bp,sp push di push es mov dx,10[bp] ;Count mov ax,12[bp] ;Byte to write peek_words_loop: les di,4[bp] mov cx,8[bp] repne scasw je peek_words_failed dec dx jnz peek_words_loop mov ax,0 ;No error peek_words_exit: pop es pop di pop bp ret peek_words_failed: mov ax,1 jmp peek_words_exit _peek_words endp public _blit_bytes _blit_bytes proc near push bp mov bp,sp push si push di push ds push es mov dx,10[bp] ;Count mov al,12[bp] ;Byte to write blit_bytes_loop: les di,4[bp] lds si,4[bp] add si,160 mov cx,8[bp] rep movsb dec dx jnz blit_bytes_loop pop es pop ds pop di pop si pop bp ret _blit_bytes endp public _blit_words _blit_words proc near push bp mov bp,sp push si push di push ds push es mov dx,10[bp] ;Count mov al,12[bp] ;Byte to write blit_words_loop: les di,4[bp] lds si,4[bp] add si,160 mov cx,8[bp] rep movsw dec dx jnz blit_words_loop pop es pop ds pop di pop si pop bp ret _blit_words endp public _get_irq_mask _get_irq_mask proc near in al,PIC1MASK mov ah,al WAFORIO in al,PIC0MASK ret _get_irq_mask endp public _set_irq_mask _set_irq_mask proc near push bp mov bp,sp mov ax,4[bp] out PIC0MASK,al mov al,ah WAFORIO out PIC1MASK,al pop bp ret _set_irq_mask endp public _get_vga_mode _get_vga_mode proc near push bp mov ah,0fh int 10h pop bp mov ah,0 ret _get_vga_mode endp public _set_vga_mode _set_vga_mode proc near push bp mov bp,sp mov ax,4[bp] ;Get new mode pusha int 10h popa pop bp ret _set_vga_mode endp public _get_cursor _get_cursor proc near push bp mov ah,3 int 10h pop bp mov ax,cx ret _get_cursor endp public _set_cursor _set_cursor proc near push bp mov bp,sp mov cx,4[bp] mov ah,1 pusha int 10h popa pop bp ret _set_cursor endp _TEXT ends end ======================================== #include extern volatile unsigned long far bios_timer; extern volatile unsigned int far bios_timer_low; extern volatile unsigned int far bios_timer_high; extern unsigned char far cga[2][80][25]; extern unsigned char far monochrome[2][80][25]; extern unsigned char far vga[256][256]; extern unsigned long gettime(void); extern void poke_bytes(void far *, int, int); int pending_int = 0; char first_timer, second_timer; int first_read, second_read; main(argc, argv, envp) int argc; char **argv; char **envp; { char buf[256]; int n; unsigned int e, f, g, h, i, j; unsigned long start, end; unsigned long a, b, c, d; unsigned int irq_mask; #if 1 /* Make sure that the timer never returns an unreasonable value. * Do this by sampling it twice in quick succession, and making sure * that the returned value is not too much larger than the previous * value. */ int a1read, a2read, b1read, b2read; int a1timer, a2timer, b1timer, b2timer; int aint, bint; b = gettime(); bint = pending_int; b1timer = (int) first_timer & 0xff; b1read = first_read; b2timer = (int) second_timer & 0xff; b2read = second_read; while (1) { a = gettime(); aint = pending_int; a1timer = (int) first_timer & 0xff; a1read = first_read; a2timer = (int) second_timer & 0xff; a2read = second_read; if (a - b > 128) { printf("N %8lx, diff %ld\n", gettime(), a - b); printf("a %8lx int %4x, 1read %4x, 1t %2x, 2read %4x, 2t %2x\n", b, bint, b1read, b1timer, b2read, b2timer); printf("a %8lx int %4x, 1read %4x, 1t %2x, 2read %4x, 2t %2x\n\n", a, aint, a1read, a1timer, a2read, a2timer); b = gettime(); bint = pending_int; b1timer = (int) first_timer & 0xff; b1read = first_read; b2timer = (int) second_timer & 0xff; b2read = second_read; } else { b = a; bint = aint; b1timer = a1timer; b1read = a1read; b2timer = a2timer; b2read = a2read; } } #elif 0 /* Make sure that gettime() is returning the low order bit clear * some of the time. Might not happen if the 8254 is read wrong. */ while (1) { a = gettime(); if (!(a & 1)) { printf("a %lx\n", a); exit(1); } /* endif */ } /* endwhile */ #elif 0 /* Test CGA screen access time. This won't give reproducible results * until the test is sync'd with the vertical refresh, and even then * the timer tick needs to be sync'd too. */ irq_mask = get_irq_mask(); set_irq_mask(0xfffe); /* sync with timer tick */ for (e = bios_timer_low; e == bios_timer_low; ) ; start = gettime(); poke_bytes(cga, 256, 1000); end = gettime(); printf("Start %lx\n", start); printf("Stop %lx\n", end); printf("Took %lu microticks.\n", end - start); set_irq_mask(irq_mask); exit(0); #elif 0 /* Another test routine to make sure that gettime() never returns an * unreasonable value. Print what gettime() got if there is a * problem. */ while (1) { int a1read, a2read, b1read, b2read; int a1timer, a2timer, b1timer, b2timer; int aint, bint; a = gettime(); aint = pending_int; a1timer = (int) first_timer & 0xff; a1read = first_read; a2timer = (int) second_timer & 0xff; a2read = second_read; b = gettime(); bint = pending_int; b1timer = (int) first_timer & 0xff; b1read = first_read; b2timer = (int) second_timer & 0xff; b2read = second_read; if (b - a > 600 #if 0 || ((a1read == 0xffff && a1timer != a2timer) || (b1read == 0xffff && b1timer != b2timer)) #endif ) { printf("N %8lx, diff %ld\n", gettime(), b - a); printf("a %8lx int %4x, 1read %4x, 1t %2x, 2read %4x, 2t %2x\n", a, aint, a1read, a1timer, a2read, a2timer); printf("a %8lx int %4x, 1read %4x, 1t %2x, 2read %4x, 2t %2x\n\n", b, bint, b1read, b1timer, b2read, b2timer); exit(1); } } #elif 0 /* Get a bunch of results and then print them. This is a good way * of seeing that the results are reproducible. It also shows why * you have to account for IRQ 0 timer ticks. */ while (1) { unsigned long a[100]; unsigned long diff; for (n = 0; n < 100; n++) a[n] = gettime(); diff = a[1] - a[0]; for (n = 1; n < 99; n++) { if (a[n+1] - a[n] > diff + 1) printf("a %lx, b %lx, diff %ld\n", a[n], a[n+1], a[n+1] - a[n]); } } #else /* Get some samples. Only keep a sample if it contains a "tick over" * event, where the 8254 rolled over. First wait for the high order * part of the 8254 to be FF: that means that a tickover will happen * "soon", and we're likely to capture it. This loop used so you * can visually see that the gettime() results are monotonic and * regular across tickovers. */ while (1) { unsigned long a[100]; while ((gettime() & 0xff00) != 0xff00) ; for (n = 99; n >= 0; n--) a[n] = gettime(); if ((a[0] & 0xffff0000) == (a[99] & 0xffff0000)) continue; for (n = 99; n >= 0; n--) printf("%lx\n", a[n]); printf("\n"); } #endif return 0; } -- James R. Van Artsdalen james@bigtex.cactus.org "Live Free or Die" Dell Computer Co 9505 Arboretum Blvd Austin TX 78759 512-338-8789