Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!ames!vsi1!wyse!mips!mash From: mash@mips.COM (John Mashey) Newsgroups: comp.arch Subject: Re: Instruction (dis)continuation ( Message-ID: <27633@winchester.mips.COM> Date: 15 Sep 89 19:38:23 GMT References: <2353@oakhill.UUCP> <261500010@S34.Prime.COM> <34701@apple.Apple.COM> <642@unicads.UUCP> <1516@atanasoff.cs.iastate.edu> <31316@ucbvax.BERKELEY.EDU> Reply-To: mash@mips.COM (John Mashey) Organization: MIPS Computer Systems, Inc. Lines: 79 In article <31316@ucbvax.BERKELEY.EDU> melvin@ucbarpa.Berkeley.EDU.UUCP (Steve Melvin) writes: .... >> Examples using the VAX instruction set (write operands are rightmost): >> >> MOVW IO_DEV_CSR,R0 ; no problem: no page faults in I/O space >> ; (even if MOVW was a restarted instr) ... >Also consider that this is a simple example, in a more heavily pipelined >machine, with perhaps even out-of-order prefetching of operands, it gets >even harder to guarantee that these reads don't occur, it basically means >that address translation for all reads must occur in order with a microtrap >mechanism to back out when an I/O address is encountered. Since the person >writing the device driver or other code that touches I/O registers generally >knows which variables map to I/O space, why not just have them use a >different instruction? Then, the microarchitecture can much more cleanly >enter and exit this synchronization point. Note that this whole issue is not (just) a hardware issue, it's a: hardware instruction-level hardware micro-architecture language definition compiler technology and operating system issue; and it's IMPORTANT to understand how these all fit together. For example: 1) Some people like to write their device drivers in a language higher than assembler. Hence they do not directly choose instructions, and if the code generator needs to do something different for memory-mapped I/O, it needs to know that. 2) Even on a simple load/store RISC machine, a global optimizer can surprise you by rearranging things; the continuation issue, and the dealing-with-optimizer issue may not look practically different from the system programmer's view, i.e., they could be surprised either way. 3) Most languages don't even have methods for telling an optimizer to be careful. C's volatile is a useful exception. 4) Systems and chips are different. One may well build a system by choosing/designing I/O controllers that have "good" properties. On the other hand, chips expected to be used in many different ways need to survive all kinds of odd behaviors. A classic reference here is by Tom Lyon & Joe Skudlarek of Sun: "All the Chips That Fit", UNIX Review 4, 2 (Feb 1989), 29-34. (Earlier version in Summer 1985 USENIX). This is subtitled: "Semiconductor manufacturers continue to heap feature upon feature, so mama, don't let your babies grow up to be system software engineers." ------- Attributes that make life simpler in machines that use memory-mapped I/O: 1) Load/store architecture, specifically, no more than one load or store per instruction, required to be on naturally-aligned boundaries, hence fast pipeline with no surprises. Include all of the 8-16-32 bit accesses as normal instructions, else some devices that must be dealt with can give surprises. For example, it is not good enough to do load-words, and then extract bytes, as you can cause problems with some device registers by issuing extra accesses. 2) If you use global-optimizing compilers, you need (in C) volatile, or some equivalent elsewhere. This has to work "right", where "right" turns out to be: after optimization, the exact same number of loads and stores to volatile variables must occur, in exactly the same order, as before such optimization. Anything less than that leads to crazed systems programmers. 3) Be careful of buffering. For example, some MIPS-based systems use a 4-deep write-buffer that provides read-around, i.e., reads have priority over writes, and hence, you can end up doing a write to a control register, and possibly then reading the associated status register while the write is still pending. (We use a kernel function wbflush() that waits until the write buffer is empty. This is OK and works; however some of the newer systems use write-flushing, i.e., a read stalls until all of the writes are done, and this is clearly easier to use, although there is little difference in performance (stalls are stalls, no matter what). In particular, it almost seems like uncached references in hardware are like volatile in software: a good default is to stall and make sure the state is clean. -- -john mashey DISCLAIMER: UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086