Path: utzoo!attcan!uunet!dev!dgis!jkrueger
From: jkrueger@dgis.dtic.dla.mil (Jon)
Newsgroups: comp.databases
Subject: Re: Relational Database, with a Graphical type field
Message-ID: <922@dgis.dtic.dla.mil>
Date: 9 Jul 90 20:04:13 GMT
References: <6207@tekgen.BV.TEK.COM> <2895@tellab5.tellabs.com> <913@dgis.dtic.dla.mil> <2952@tellab5.tellabs.com>
Organization: Defense Technical Information Center (DTIC), Alexandria VA
Lines: 105

segel@tellabs.com (Mike Segel) writes:


>Jon, you are missing the point. By keeping the Blob as part of the tuple,
>you have now a tuple of 2Meg (+- rest of tuple) in width.

No.  It appears that way to queries that don't project fewer columns,
though.  The old virtual/transparent distinction.  How the engine
manages resources like disk storage isn't visible to folks that
send queries to the engine.  Nor how the engine selects or sorts
on the large columns -- could be that it trims trailing whitedots,
uses G3 compression, uses sparse matrix algorithms, etc.  And of
course avoiding exhaustive scan of large columns isn't any different
in principle from avoiding exhaustive scans of many rows.  It is
harder to implement, however; witness that no commercial product
of which I'm aware provides lazy fetching of columns.

>As well as the fact that not all tuples will have a blob attached, but will
>have to have space allocated for a blob. 

No.  Look at how VM algorithms work.  Empty cols can cost small fixed
allocations.  Instances of BLOBs can cost in proportion to their
contents.  This even presumes that tables and cols still appear to be
fixed length, which isn't a hard requirement; they could expand
arbitrarily to fit their contents, too.  But even without that they
can appear as fixed length while being implemented in cheaper ways.
Trailing whitespace compression for text cols works this way now.

>All of these problems are reduced when you just have a pointer as part of
>the tuple. The pointer can point to the Blob storage area of a raw disk
>(Informix Online), or to a file or directory in Unix. 

If the pointer appears different from ordinary objects to the user, you
lose the simplicity and safety of the data model.  If not, why call it
a pointer?  Also it's unlikely the overhead of the UNIX filesystem is
going to be your bottleneck.  Work smarter!  Don't expect your image
database will get its best performance on the highest bandwidth, some
engines will use processor to avoid some of those bit copies.

>The point is, they (SG) and Informix are providing the ability of ADT's
>by allowing for Blobs.

But ADT's have nothing to do with BLOBS.  Nothing.  Consider bignums,
arbitrary precision floats, etc.  They have everything to do with
defining operations on objects of that type, and preventing access to
and manipulation of objects of that type via other means than the
defined operations.

>I think back that the discussion evolved from 
>trying to allow for ADT's like graphical images, sounds, text, or various
>other fields. This can all be accomplished withing the relational model.

Correct.  But these are just the sexy data (lit. and fig., in some
cases :-)  Consider what it would mean to scientific and engineering
folk to have a database with a numeric type that doesn't overflow.

>...Sunview on a Sun workstation and a CD Rom device, and the other on a Mac.
>running Wingz. Now, both show you
>a raster of the house, and different views. How is it stored? How can you
>take a raster/gif/ picture which is required for two or three diferent 
>machines, and store it in the DB? You could create an ADT for each 
>raster/image format, but that means storing the photograph in the DB
>several times. Or you could separate the header information from the blob,
>then have the front end application, based on the machine, reasemble the
>image in the correct format and the header information. 

You're confusing two, no three issues here.  One is remote data access,
one is defining families of image data types, and one is database
design.  In times of old this was known as the incompatible subroutine
library problem -- different floating point formats at different
precisions, and overlapping sets of routines for each format, but worse
yet specific to the language, operating system, and processor you call
them from.  Calling sequence, don't you know.  It took twenty years to
standardize on IEEE floats, write mostly portable libraries, and
arrange for common (or at least sane) linkage schemes.  It is now
possible to write programs that use floats and expect them to behave
the same way across a wide variety of machines.  It is *still* not
possible to share binary floats between programs running on different
Mwchines in a net -- without conversion, which is the point.  That's a
separate problem.  And yet a third problem is designing good databases
to get shared.

In the example you cite, the right solution is to design an image type
appropriate for the pictures of houses you have, define this type to
your database engine, populate the database, provide common access to
your network of (different) machines, and, decide on a common model
for painting the pictures on different display hardware.  Notice how
little of this is a database problem.  Also notice that we still want
the engine to understand the image data type, not each front end.

>My point is that the DB backend should be storing the data 
>in its simplest components rather than trying to handle data in its more
>complex forms.

But *floats* are complex.  Very.  Interpretation of bitmapped images
should learn from this.  We have and can examine engines that share
floats among different hardware architectures.  They do *not* do this
be keeping the engine ignorant of what the bits in a float mean.  They
do not do this by letting each front end decide what floats mean to
it.  Access to had better not equal subversion of.

-- Jon
-- 
Jonathan Krueger    jkrueger@dtic.dla.mil   uunet!dgis!jkrueger
Drop in next time you're in the tri-planet area!