Newsgroups: comp.arch Path: utzoo!utgpu!watserv1!maytag!watdragon!watsol.waterloo.edu!tbray From: tbray@watsol.waterloo.edu (Tim Bray) Subject: Big files, and lots of 'em: 32 bits is not enough Message-ID: <1990Aug8.222644.23683@watdragon.waterloo.edu> Sender: daemon@watdragon.waterloo.edu (Owner of Many System Processes) Organization: University of Waterloo References: <5539@darkstar.ucsc.edu> <13285@yunexus.YorkU.CA> <30728@super.ORG> <13667@cbmvax.commodore.com> <40644@mips.mips.COM> Date: Wed, 8 Aug 90 22:26:44 GMT Lines: 42 mash@mips.COM (John Mashey) writes: >jesup@cbmvax (Randell Jesup) writes: >>Few machines >>(percentage-wise) even have 4 GB of storage, let alone files larger that 4GB >>(I've never even seen a file larger than 100MB, even on mainframes). >However, I'd STRONGLY disagree with the idea that 64-bit machines will >remain confined to the super- & minisuper world for 10-20 more years. >So, here's a thought to stimulate discussion: > What applications (outside the scientific / MCAD ones that > can obviously consume the space) would benefit from 64-bit > machines? An example: text database. In a textbase, you must have addressability to the byte, not to the record. Also, it is very very convenient to regard all the text in your universe as being in one linear address space. 32 bits worth of text is not very much text in real-world terms. Here is some 'ls' output from a directory containing the electronic Oxford English Dictionary, Second Edition, and some supporting files. -r--r----- 1 tbray 572728830 Sep 7 1989 oed-2e -r--r----- 1 tbray 179728816 Sep 7 1989 oed-2e.struct -r--r----- 1 tbray 475589360 Sep 8 1989 oed-2e.tree About 28 bits worth right there. But I want a database with the OED and the complete Shakespeare and Chemical Abstracts and the complete Library of Congress Catalogue and a couple decades' worth of AP wire service; that's almost enough text to be really useful. But seriously folks, there's lots of insurance companies and research institutions and government departments with *lots* more than 4 Gb sitting around... And I think it's a *bad* idea, as some have proposed, to create a new datatype for file offsets as opposed to addresses as opposed to integers. As Henry Spencer and others have repeatedly pointed out, the VAX made us all sloppy by allowing us to interchange pointers, integers, and offsets promiscuously. But too late, we're stuck with it; there's not enough programmer-years in the lifetime of the universe to fix all the useful software that does this. And y'know, in my heart of hearts, I'm not sure it's a bad thing; it certainly allows the use of some extremely elegant and rigorously simple programming paradigms. Cheers, Tim Bray, Open Text Systems, Waterloo, Ont.