Path: utzoo!utgpu!jarvis.csri.toronto.edu!clyde.concordia.ca!uunet!aplcen!uakari.primate.wisc.edu!brutus.cs.uiuc.edu!lll-winken!sun-barr!newstop!sun!opus!gingell From: gingell%opus@Sun.COM (Rob Gingell) Newsgroups: comp.arch Subject: Re: mmap() vs. read() (Was: Re: the Multics from the black lagoon :-)) Message-ID: <131754@sun.Eng.Sun.COM> Date: 13 Feb 90 18:00:34 GMT References: <8859@portia.Stanford.EDU> <20571@watdragon.waterloo.edu> <1990Feb12.053616.11455@Solbourne.COM> <3556@rti.UUCP> <10468@alice.UUCP> <131682@sun.Eng.Sun.COM> <1990Feb13.003010.23356@utzoo.uucp> Sender: news@sun.Eng.Sun.COM Reply-To: gingell@sun.UUCP (Rob Gingell) Organization: Sun Microsystems, Mountain View Lines: 88 In article <1990Feb13.003010.23356@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >In article <131682@sun.Eng.Sun.COM> gingell@sun.UUCP (Rob Gingell) writes: >>... At the very least, the read() version will be slower >>than the mmap() version by the amount of time required to effect the >>copies from kernel to program buffers... > >Assuming your MMU can do copy-on-write, why copy? See >> below... >>... read() gives you a copy of the file data >>at the time that the call is executed. That copy is immutable save any >>action performed by your program. If read() were implemented *as* mmap(), >>then while it is possible to deal with side effects introduced in *your* >>machine, it is not, in general, possible to deal with side effects introduced >>in other machines... > >Why are other machines relevant? Can they reach into your machine and >mess with memory? Or are you assuming an implementation where other >machines are not told "I'm using this data, and want to be told before >anyone else starts to change it"? I'm assuming an environment in which it is not possible to tell such things. It is, of course, possible to assume other environments. TOPS-20, for instance, assumed that all machines that shared mapping to a file were TOPS-20, and therefore had all the TOPS-20 semantics about sharing, copy-on-write, etc. TOPS-20 had other convenient properties such as a page-based file system that also helped simplify the problem and provide a reasonably powerful and "pure" environment. However, such assumptions made it impossible for other systems to share access to the files involved -- which, in my experience anyway, was a real and important shortcoming. >Clearly, implementing read with copy-on-write mapping requires a proper >implementation of copy-on-write, in which *any* attempt to mess with the >data triggers a copy operation. Yes. It also assumes those few cases in which the data is so conveniently aligned as to make this optimization possible. One has to question whether the complexity is worth cobbling up something so semantically simple as the read() call. And, it also excludes every system which is not capable of cooperating with the somewhat rigorous semantic requirements, including PC's, most IBM operating systems running on any hardware, in short, the vast majority of hardware in the world. >If some defective file system, call it >NFS (name picked purely at random :-)), is not capable of supporting >copy-on-write, then you can't do this optimization when the file is on >the other end of NFS. Gee, you still got any ax left after all these years of grinding? :-) The argument has nothing to do with NFS. It has to do with heterogeneity. It may be that accomodating heterogeneous hardware and software environments is a complexity that some can live without. Others have not been so fortunate, and must instead deal with a melting pot reality. You can certainly design everything around simplifying assumptions, but at some point you have to wonder whether or not the assumption set has excluded most of the real world. As it happens, the viewpoint I'm describing predates personal familiarity with the NFS, and was derived from experiences in trying to deal with mixing the relative "purity" of the TOPS-20 assumption set with real installations where TOPS-20 (or any system) exclusivity was not possible. And, independent of any particular notion of exclusivity, it's also a recognition that transparent cache coherence is not always possible given heterogeneous hardware and interconnects. In the general environment it may not even be desirable due to the cost of maintaining the coherence. While this is most evident today in systems involving networks such as CI busses, Ethernet, FDDI, etc., it seems increasingly likely that we will have to build architectures around a variety of assumptions involving weakly-ordered operations as we deal with more and more independent caches and buffers. It is these experiences that has driven much of the semantic definition and less so the existence of NFS. That the NFS also dealt with heterogeneity and the mutual assumption sets mesh well is either the result of serendipity or congruence. >If, on the other hand, you have a real file system, >it's not a problem. This of course assumes we have a definition of a "real" filesystem. I guess NFS isn't "real" because it isn't "UNIX", and you're assuming that "real" == "UNIX". And NFS isn't a UNIX file system over networks, it's a network file system of which UNIX can be a client. Yeah, it looks a lot like UNIX, but that's it's heritage -- most people going to design something new generally carry most of their experience into it. But just as UNIX can be an NFS client so can MVS, PC-DOS, VMS, and a variety of other non-UNIX systems which -- even assuming NFS (or whatever) supported the "copy-on-write" semantics required, such systems would still be incapable of responding to them.