Path: utzoo!attcan!uunet!samsung!zaphod.mps.ohio-state.edu!uwm.edu!lll-winken!sun-barr!newstop!sun!khb
From: khb@chiba.Eng.Sun.COM (Keith Bierman - SPD Advanced Languages)
Newsgroups: comp.arch
Subject: Re: Extremely Fast Filesystems
Message-ID: <KHB.90Aug7132932@chiba.Eng.Sun.COM>
Date: 7 Aug 90 20:29:32 GMT
References: <13285@yunexus.YorkU.CA> <30728@super.ORG> <13667@cbmvax.commodore.com> <1990Aug7.190719.7907@caen.engin.umich.edu>
Sender: news@sun.Eng.Sun.COM
Organization: Sun MegaSystems
Lines: 55
In-reply-to: pha@caen.engin.umich.edu's message of 7 Aug 90 19:07:19 GMT


In article <1990Aug7.190719.7907@caen.engin.umich.edu> pha@caen.engin.umich.edu (Paul H. Anderson) writes:

...

   Populations Studies Center, for example, would like nothing better than to
   quickly analyze 5 gigabyte datasets (hence my earlier request for large
   RAM systems).  Furthermore, many such datasets exist.  The 1990 census
   is just one 5 gigabyte file - there are similar files for the last
   100 years or more.  Likewise for China, Russia, Europe, and more.

   Analyzing these things quickly is not currently very easy, but that
   doesn't mean that people don't want to do it.
...

Humm. In estimation problems there are lots of ways to skin cats.
Algorithms which have huge datasets, but "small" models do not require
huge "core" storage.

In the satallite tracking biz, some experiements (like GPS baselines)
go on for years, and Tb of data could be necessary if one formed the
obvious 
         T
	A A

and proceeded to use elimination from there.

Back when I did that sort of work, we employed Square-Root Information
Filters, and/or UDU**T decomposition techniques. If, for the sake of
argument, your model has 70 independent variables, the bulk of the
"core" needed is

	(70+71)/2 = 71 words of storage

_independent_ of the size of the dataset. Of course, one also gets
estimates in "real time" (viz as fast as the data are available).

The "naive" approach would require that the entire dataset fit in
"core". 

I am sure that there are many problems which require really huge
memories ... but I am certain that use of appropriate algorithms can
limit the number of such "hogs" considerably.

Those interesed in SRIF and UD techniques might wish to peruse

	Factorization Methods for Discrete Sequential Estimation
	ISBN  0 12 097350 2


--
----------------------------------------------------------------
Keith H. Bierman    kbierman@Eng.Sun.COM | khb@chiba.Eng.Sun.COM
SMI 2550 Garcia 12-33			 | (415 336 2648)   
    Mountain View, CA 94043