Path: utzoo!lsuc!spectrix!yunexus!geac!john
From: john@geac.UUCP (John Henshaw)
Newsgroups: comp.databases
Subject: Re: Distributed Database Locking
Summary: general comments
Message-ID: <2465@geac.UUCP>
Date: 18 Mar 88 19:29:24 GMT
References: <207@tijc02.UUCP>
Organization: GEAC Computers, Toronto, CANADA
Lines: 130
Posted: Fri Mar 18 14:29:24 1988


In article <207@tijc02.UUCP>, pjs269@tijc02.UUCP (Paul Schmidt) writes:
> First I feel there are two methods for locking distributed databases.
> [much text deleted]

I'll confine most of my comments to LOCKING strategies. There are of course,
several non-locking approaches, but I assume that you are focusing on
locking only. Some comments further down reference other methods...

The preemptive locking strategy (PRE) has the benefit that deadlock
cannot occur. However, we rejected it as a viable method on "practical
grounds". In particular, PRE requires that the programmer know *all* the
resources that will be required *prior* to the initiation of the
transaction. This is rarely the case. As far as I know, all non-PRE
locking schemes require deadlock detection, which is a pain, but a
necessary one.

PRE will lock resources at the earliest point in time (transaction
initiation) and hold them for the duration of the transaction. This
means that the "growing phase" of the 2-phase locking activity happens
very quickly, and the transaction can proceed without ever being
blocked. This suggests that PRE may hold transaction locks for a SHORTER
time than other schemes, not longer. I suspect that the real answer to
this issue could be resolved via simulation and that the major elements
affecting this would be the probability of conflict between
transactions, and the level of concurrency (number of transactions
executing at once).

You write:
>                                                                    By
> locking resources when they are needed, one can avoid unnecessary locks
> but a deadlock detection scheme would have to be used.

Could you please explain what is meant by an "unnecessary lock"? I
imagine that you really meant to address the *timely* initiation of
locks so that objects (resources) are locked for as short a period as
possible. Otherwise, I will disagree: "If ya gotta lock it, ya gotta
lock it...", hence all locks *are* necessary.

You write:
> Locking of all resources at the beginning of the transaction has the
> following advantages:  ease of implementation and deadlock prevention.
> This will be easier to implement since a deadlock detection algorithm
> and transaction rollback scheme does not have to be implemented.

PRE, while deadlock free, does not free the implementor from transaction
rollback. While it is clear that deadlock requires transaction rollback,
there are other reasons why one may roll back a transaction. For
example, a end-user may deliberately choose to ABORT a transaction, or the
application software may decide, due to various conditions (data
dependencies, data values, etc.) to ABORT. Either case requires rollback.

You write:
> Locking of resources as needed has the following advantages:  Resources
> are locked only when being used, and the total transaction does not need to
> be known at the start of the transaction.   Since resources are only locked
> when they are being used, processing time will not be lost when another
> process is waiting on that resource.

This description is somewhat inexact. Resources, once locked, are locked
until Commit time of the transaction. Having locked a resource/object,
other processes will be blocked accessing that object. (This assumes
that the object has been either written, or is assumed to be updatable
at the time of lock initiation.) You point out a major advantage of
standard 2 phase locking -  that the programmer need not know which
objects/resources will be locked at the Start Transaction juncture. It
might be useful in your analysis to distinguish between the types of
2 phase locking that you have in mind. For example, when a lock is
granted on an object which is read, do you assume that you are going to
update it? or do you convert read locks to write locks at update time?

A reasonably good paper involving this issue is: "The Performance of
Concurrency Control Algorithms for Database Management Systems" by
Michael Carey, and Michael Stonebreaker, 1984, VLDB Publications
(Proceedings of the Tenth International Conference on Very Large
Databases). It introduces a useful simulation framework that you may
find helpful in your efforts. 

You ask:
> Questions:
> 
> Are there other methods?

Sure: tree-locking, timestamp, optimistic, and multiversion are all
written up. I suggest that you check out the library, looking at the
last few years of: "Transactions on Database Systems". If you can get
hold of "Distributed Systems: Volume 2", edited by Wesley Chu, ISBN
0-89006-213-7, chapter 2 talks about concurrency control, and other
chapters address several case studies. Author names that may be useful
are: Bernstein, Carey, Chu, Eager, Ellis, Goodman, Gray, Kung, Moss, 
Obermarck, Sevcik, Stonebreaker, Robertson, and Weiderhold, to name only
a few!

> What are the distributed deadlock detection schemes?

The two I know about are "wait-for-strings", and "wait-for-graphs". Most
systems that do deadlock detection, don't check for deadlock every time
a lock is placed. Instead, a transaction, when blocked, times out, and
then the check is made. In the distributed case, the lifetime of the
average transaction may be quite long, and communication delays compound
the difficulties that this approach already has. This approach of
waiting is done in order to minimize the cost of checking graphs for
cycles - usually a CPU-intensive process.

> How can one tell when one method is better than the other?

Good question. I think the answer to this question depends a lot on your
criteria - something that differs for each situation. Reliability,
robustness, efficiency, correctness, etc., etc., etc.... are all in the
list somewhere. Judging DISTRIBUTED locking schemes is like judging
CENTRALIZED locking schemes, except a few of the base rules/assumptions
are changed.

> How do crash recovery and deadlock detection rollback relate?  Are most of
> the recovery problems already solved by crash recovery?
> 

Judging from your question, you are more interested in Distributed
Transaction Management - not just "distributed locking". The "short"
answer to this question is "yes", but the implementation details are
where the real heart of the matter lies.


Good luck with your work Paul.

-john-
-- 
John Henshaw,			(mnetor, yunexus, utgpu !geac!john)
Geac Computers Ltd.		If we don't pay for education now, are we
Markham, Ontario, Canada, eh?	going to be able to pay for ignorance later?