Path: utzoo!mnetor!uunet!lll-winken!lll-lcc!lll-tis!ames!sdcsvax!rutgers!gatech!uflorida!codas!burl!clyde!watmath!watcgl!watsol!tbray From: tbray@watsol.waterloo.edu (Tim Bray) Newsgroups: comp.arch Subject: More than 32 bits needed where? Message-ID: <3104@watcgl.waterloo.edu> Date: 30 Jan 88 18:38:59 GMT References: <235@unicom.UUCP> <28200089@ccvaxa> Sender: daemon@watcgl.waterloo.edu Reply-To: tbray@watsol.waterloo.edu (Tim Bray) Organization: New Oxford English Dictonary Project, U. of Waterloo, Ontario Lines: 19 Keywords: integer range, 32 bits, 64 bits Summary: Big Text Databases In article <28200089@ccvaxa> aglew@ccvaxa.UUCP writes: >> I really don't think the real world really needs anything more >>expansive than a 32 bit processor to get most jobs done. >I'm sure that most people wouldn't need this, but some might - and I'd >like to get a feel for the size of such a niche, if it exists. Here at the New Oxford English Dictionary Project, we are in the business of software for large, structured, full-text databases. This involves keeping a lot of pointers into the text. With the OED, we are fortunate in that the text is `only' about 500 Mb in size. However, 32 bits only allows you to address a 4Gb database at the character level. In terms of text databases, 4 Gb is big but not that big. There are fantastic performance advantages to be gained by having your database pointers atomic, integer-like objects so that you can do very fast comparison, interpolation searching, Patricia trees, and the like. So here's one application for 64-bit ints. Let's see, 64 bits gives about (4 * (10**9)) ** 2, about 1.6 * 10**19 characters, should be enough to get us to 2000 with luck... Tim Bray, New OED Project, U of Waterloo