Path: utzoo!utgpu!news-server.csri.toronto.edu!dgp.toronto.edu!jonah Newsgroups: comp.arch From: jonah@dgp.toronto.edu (Jeff Lee) Subject: Extremely Large Filesystems [was: Re: Extremely Fast Filesystems] Message-ID: <1990Aug8.195229.23544@jarvis.csri.toronto.edu> References: <5539@darkstar.ucsc.edu> <13285@yunexus.YorkU.CA> <30728@super.ORG> <13667@cbmvax.commodore.com> <30979@super.ORG> Date: 8 Aug 90 23:52:29 GMT Lines: 42 rminnich@super.ORG (Ronald G Minnich) writes: > [...] More important issue: suppose I find that there >is a 6 Gb file at NCAR which shows a really neat ocean model. It is there, >my workstation is here, so what do i do? Nowadays you do the easy thing: >ftp it over the net. YYYYUUUUCCCCKKKK. No, wait, i forgot: buy plane >tickets to Colorado. Now that is fun, but you have just left your >entire environment behind in (my case) Bowie, Md. That is no good either: >now i have to ftp my environment to Colorado! >What I *want* to do is say: "when this >program runs, please associate this 6Gb chunk of its address space with >that file over there on NCAR". Problem solved. [...] Given the current data+program+interface modularization, there are at least four options: 1) hire a station-wagon full of mag-tapes (or send a DAT by over-night courier) [6MB would saturate a T1 line (1.5Mbit/sec) for 9.1 hours.] 2) split between the data and program (e.g. with distributed shared memory) 3) split between the program and interface (e.g. with Plan 9, X, or NeWS) 4) plan a quick holiday in Colorado (with Plan 9, your environemnt follows you automatically) It depends on where the data-flow volumes are and what are the cost breaks. If you plan to use all of the data more than once, grabbing your own copy is not a bad idea. If you are going to analyse the some or all of the data set and plot summary results, do it remotely and ship the plot back (batch or real-time). Only if you are planning to randomly access a *small* portion of this database does mapping all 6GB into your address space make sense. [And if you are randomly accessing a small part, then a remote file system might work almost as well as mapping it into memory.] I will agree though that most present operating systems will choke on the idea of a single 6GB random-access file -- or a 6GB virtual memory image. j.