Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!usc!elroy.jpl.nasa.gov!turnkey!orchard.la.locus.com!prodnet.la.locus.com!jfr
From: jfr@locus.com (Jon Rosen)
Newsgroups: comp.databases
Subject: Re: how to implement cursor based UPDATE/DELETE
Message-ID: <24334@oolong.la.locus.com>
Date: 13 May 91 19:09:09 GMT
References: <22626@shlump.lkg.dec.com> <12260@mentor.cc.purdue.edu>
Organization: Locus Computing Corp, Los Angeles
Lines: 72

In article <12260@mentor.cc.purdue.edu> tmp@mentor.cc.purdue.edu (Tom Putnam) writes:
>In article <22626@shlump.lkg.dec.com> saunders@refine.enet.dec.com 
>(Kyle Saunders) writes:
>>	I was wondering how you would implement cursor based UPDATE's and 
>>DELETEs. I have given some thought to this and cannot see how the RDBMS can
>>properly update the correct row given the fact that the table to be updated
>>may not have _any_ indexes, and not all fields are guarenteed to be included
>>in the query.  This means there is no way to uniquely determine which row in
>>the original base table is being updated.
><some stuff deleted...>
>....  The work-around was to recognize that (at least under INFORMIX),
>there is a field called ROWID for each row.  I can select it, then use it to
>give me an "absolute pointer" to a record.
><code deleted...>
>Of course, one should be absolutely paranoid about ever keeping a ROWID around
>for any longer than this sort of use.  Systems need to be able to physically
>move rows to other locations as the database is updated.
 
Kyle questions how cursor-based UPDATEs and DELETEs can be implemented.  
If this refers to how an RDBMS implements it, the answer is related
to Tom's response... I.e., internally, the RDBMS must keep some kind of
a relative row id that it uses to link the cursor-retrieved row back to
the actual data row.  In fact, this implementation is quite common.  It
even leads to those databases such as Informix and Oracle actually
materializing the ROWID into a user-accessible column which can be 
used to do direct-access updates against rows without using RDBMS cursors
and the UPDATE/DELETE WHERE CURRENT OF statement.                  
 
Also, it is not as bad as Tom thinks to keep using this ROWID.  Usually,
the ROWID is a logical construction and has nothing (or little) to do
with the physical location/inode/record number of the actual row.  So
moving the row will generally not change its ROWID.  I have seen 
an application in Oracle that even uses ROWID to build links between
tables (UGGGH) in order to improve performance.  This requires that the
ROWID be persistent.  There can be difficulties if rows are deleted
and the associated ROWID references in other linked tables are not
(a "referential integrity" issue) since ROWIDs can be reissued once
the original row is no longer available.  A better solution would be
a fine granularity TIMESTAMP... In fact, Postgres uses this solution
to individualize rows and to allow for update/delete with full history.
All versions of rows are kept (until explicitly purged) and you can
browse through outdated versions of rows at will.  Of course, the
default is to look at the current version of a row.

Unfortunately, the ROWID solution begs several questions.  First of all, a
materialized ROWID is by no means universal.  Most RDBMS (including
DB2, OS/2 EE, RDB, Ingres, etc.) do not have such an abstraction.
In addition, neither ROWID nor updatable cursors deal with data
integrity across COMMITs.  When data is being updated on-line, it is
not advisable to hold a cursor open across the workstation or terminal
i/o (otherwise locks are kept for long periods of time).  Thus, if you
use a cursor to select some rows for display, you still should COMMIT
prior to displaying them.  Now, your updatable cursor is no longer
updatable.  In fact, you will either have to reread the rows with a
new updatable cursor or use a direct UPDATE.  Since you have no lock 
on the data while you are using it on-screen, someone else can come
along, read and modify the row without you knowing it.  
 
Once again, either a very fine granularity TIMESTAMP or an update/change
count column is needed to ensure that your update does not screw up
some other data that has been changed since your read.  Alternatively,
you have to do an UPDATE with a WHERE clause on every column of the
TABLE giving the original value in order to make sure that the row has
not been changed (bad idea from a performance standpoint).  
 
As for the duplicate row issue, don't you know that relational data
bases aren't supposed to have duplicate rows??? :-)
 
Jon Rosen