Path: utzoo!attcan!uunet!dev!dgis!jkrueger From: jkrueger@dgis.dtic.dla.mil (Jon) Newsgroups: comp.databases Subject: Re: Relational Database, with a Graphical type field Message-ID: <922@dgis.dtic.dla.mil> Date: 9 Jul 90 20:04:13 GMT References: <6207@tekgen.BV.TEK.COM> <2895@tellab5.tellabs.com> <913@dgis.dtic.dla.mil> <2952@tellab5.tellabs.com> Organization: Defense Technical Information Center (DTIC), Alexandria VA Lines: 105 segel@tellabs.com (Mike Segel) writes: >Jon, you are missing the point. By keeping the Blob as part of the tuple, >you have now a tuple of 2Meg (+- rest of tuple) in width. No. It appears that way to queries that don't project fewer columns, though. The old virtual/transparent distinction. How the engine manages resources like disk storage isn't visible to folks that send queries to the engine. Nor how the engine selects or sorts on the large columns -- could be that it trims trailing whitedots, uses G3 compression, uses sparse matrix algorithms, etc. And of course avoiding exhaustive scan of large columns isn't any different in principle from avoiding exhaustive scans of many rows. It is harder to implement, however; witness that no commercial product of which I'm aware provides lazy fetching of columns. >As well as the fact that not all tuples will have a blob attached, but will >have to have space allocated for a blob. No. Look at how VM algorithms work. Empty cols can cost small fixed allocations. Instances of BLOBs can cost in proportion to their contents. This even presumes that tables and cols still appear to be fixed length, which isn't a hard requirement; they could expand arbitrarily to fit their contents, too. But even without that they can appear as fixed length while being implemented in cheaper ways. Trailing whitespace compression for text cols works this way now. >All of these problems are reduced when you just have a pointer as part of >the tuple. The pointer can point to the Blob storage area of a raw disk >(Informix Online), or to a file or directory in Unix. If the pointer appears different from ordinary objects to the user, you lose the simplicity and safety of the data model. If not, why call it a pointer? Also it's unlikely the overhead of the UNIX filesystem is going to be your bottleneck. Work smarter! Don't expect your image database will get its best performance on the highest bandwidth, some engines will use processor to avoid some of those bit copies. >The point is, they (SG) and Informix are providing the ability of ADT's >by allowing for Blobs. But ADT's have nothing to do with BLOBS. Nothing. Consider bignums, arbitrary precision floats, etc. They have everything to do with defining operations on objects of that type, and preventing access to and manipulation of objects of that type via other means than the defined operations. >I think back that the discussion evolved from >trying to allow for ADT's like graphical images, sounds, text, or various >other fields. This can all be accomplished withing the relational model. Correct. But these are just the sexy data (lit. and fig., in some cases :-) Consider what it would mean to scientific and engineering folk to have a database with a numeric type that doesn't overflow. >...Sunview on a Sun workstation and a CD Rom device, and the other on a Mac. >running Wingz. Now, both show you >a raster of the house, and different views. How is it stored? How can you >take a raster/gif/ picture which is required for two or three diferent >machines, and store it in the DB? You could create an ADT for each >raster/image format, but that means storing the photograph in the DB >several times. Or you could separate the header information from the blob, >then have the front end application, based on the machine, reasemble the >image in the correct format and the header information. You're confusing two, no three issues here. One is remote data access, one is defining families of image data types, and one is database design. In times of old this was known as the incompatible subroutine library problem -- different floating point formats at different precisions, and overlapping sets of routines for each format, but worse yet specific to the language, operating system, and processor you call them from. Calling sequence, don't you know. It took twenty years to standardize on IEEE floats, write mostly portable libraries, and arrange for common (or at least sane) linkage schemes. It is now possible to write programs that use floats and expect them to behave the same way across a wide variety of machines. It is *still* not possible to share binary floats between programs running on different Mwchines in a net -- without conversion, which is the point. That's a separate problem. And yet a third problem is designing good databases to get shared. In the example you cite, the right solution is to design an image type appropriate for the pictures of houses you have, define this type to your database engine, populate the database, provide common access to your network of (different) machines, and, decide on a common model for painting the pictures on different display hardware. Notice how little of this is a database problem. Also notice that we still want the engine to understand the image data type, not each front end. >My point is that the DB backend should be storing the data >in its simplest components rather than trying to handle data in its more >complex forms. But *floats* are complex. Very. Interpretation of bitmapped images should learn from this. We have and can examine engines that share floats among different hardware architectures. They do *not* do this be keeping the engine ignorant of what the bits in a float mean. They do not do this by letting each front end decide what floats mean to it. Access to had better not equal subversion of. -- Jon -- Jonathan Krueger jkrueger@dtic.dla.mil uunet!dgis!jkrueger Drop in next time you're in the tri-planet area!