Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!uunet!husc6!cmcl2!rutgers!ames!sdcsvax!ucbvax!dewey.soe.berkeley.edu!oster
From: oster@dewey.soe.berkeley.edu (David Phillip Oster)
Newsgroups: comp.sys.mac
Subject: Re: How to write TEXT editors (part 1 of 2)
Message-ID: <20876@ucbvax.BERKELEY.EDU>
Date: Sun, 20-Sep-87 15:39:53 EDT
Article-I.D.: ucbvax.20876
Posted: Sun Sep 20 15:39:53 1987
Date-Received: Sun, 20-Sep-87 22:58:50 EDT
References: <3875@cisunx.UUCP> <20831@ucbvax.BERKELEY.EDU> <20860@ucbvax.BERKELEY.EDU>
Sender: usenet@ucbvax.BERKELEY.EDU
Reply-To: oster@dewey.soe.berkeley.edu.UUCP (David Phillip Oster)
Organization: School of Education, UC-Berkeley
Lines: 110
Keywords: System 4.1 and newer text edit

In article <20860@ucbvax.BERKELEY.EDU> korn@cory.Berkeley.EDU.UUCP (Peter "Arrgh" Korn) writes:
>> (m) Suppose the user opens a document in your program, then deletes the data
>> file in multi-finder, or renames it, or moves it to another folder. Apple
>> user interface guidline people want me to open the file, and not close it
>> again until the user closes or quits.
>> I prefer to handle this problem by reading into memory the entire document,
>> including the finder info and the creation and modification times, and any
>> resources, then closing the file.  Let the user do horrible things to the 
>> copy on disk. When the user saves, if my code can't find the original file,
>> or if the modification date doesn't match, it puts up an sfPutFile dialog
>> with an explanatory dialog below it on the screen:
>> "Something changed this file since the last time it was saved. I suggest
>> you "Save As" this file with a different name." 

>> The advantage of this scheme is there is no limit to the number of open
>> documents you can work with (as opposed to the small number simultaneously
>> open files) and the program doesn't have to worry about the user renaming
>> an open file, or changing its folder.


>The idea is a neat one, but it turns out there is a situation in which it's
>very *dangerous*   Imagine if you will a network.  It has many machines on
>it, most of which are workstations, and some of which are servers.  Imagine
>that two workstations are editing the same file on a server.  One read in
>the file at 1:00pm, the other at 1:18pm.  The first one finishes it's changes,
>and writes it back at 1:20pm.  The other finishes at 1:31pm, and writes it's
>changes _on_top_of_the_first_set_.  You immediately tell me "Peter, this
>can't happen, David's algorythm takes care of that."  Well, in theory it
>does.  But in practice, it can fail, because the two Macs in question
>(the workstations) may have a slightly different idea of what time it is...
>The mac that gets the file at 1:18pm may be 3 minutes fast, and think that
>it's really 1:21pm.  When that mac goes to write the file back, it notices
>that the the last save to the file happened 'before' this mac got it, so
>it 'knows' that it's ok to save on top of it.    

No, it knows that the file has changed since it read it (it got
newer). Nice try, but it still can't happen unless the new save time
matches _exactly_. You check for an _exact_ match on the modification
time, not just a less than or a greater than. But Peter is right,
there is still the chance of a problem.

>A slightly different scheme that works on the same principle might increment
>some number, an 'in use' number.  Each time an application edited a copy
>of that file, it would increment the 'in use' number.  Each time it closed
>the file, it would decrement that number.  If it saw that the number was > 0
>at save time, it would give the user the dialog, and let the user decide.
>This method also has it's problems.  It's more complex, and it requires
>more writes to the file (even if I 'save as', I still have to decrement
>the 'in use' number).  And the file might be replaced with one that has
>a lower (or higher) 'in use' number than it should have, wreaking havoc.

Not a good scheme, because it requires that every application write
into every file it accesses, and most accesses are read accesses. This
scheme is almost right, though.

Peter, is it really true that if I write a file on a remote machine
the write happens with _my_ clock, and not the remote machine's clock?
After all, I'm just sending a message to the remote machine to do the
write. It is the one that is actually issuing the read and write
system calls that change the data on its disk. Why should it use my
clock? Since you have access to a net, could you please do the
experiment and report back. 

>Perhaps the safest way of dealing with it all is to just set the busy
>bit.  That way nobody can mess with the file behind your back (without
>having to go through an override that requires they think about what they
>are doing).

Setting the busy bit, (I think) also counts as a write, so it fails
for the same reason given above.  I believe that there is no reason to
restrict multiple reads, it is just multiple writes that are the
problem. Obviously, my model is text editors rather than databases. If
I were thinking about databases, I'd need a more sophisticated
concurency control system, but a text editor rewrites the entire file
on each save.  A database program writes pieces of its files, (while
those files may still be being actively read by other programs.)

Let's combine all of the above into a scheme that should work. Suppose
we have a "saveCount" number in the file, that counts the number of
times the file has been saved. Using the file goes as follows:

1.) The user selects "open". The software reads the entire file, both
forks, and the finder information into memory and closes the file.

2.) Some time later, the user selects "save". The software first 
2.a) opens the file read/write.  If the open fails (because the file
is no longer there, or because it is already in use by another writer)
the software puts up its Save As dialog, with the explanatory message
"This file has been changed by another program since the last time you
saved it."
2.b) If the open succeeds, it reads up the save count to see if that
matches what the application thinks it should be. if that doesn't
match, it goes into the above SaveAs mode.
2.c) It writes and closes the file. When it wrote the file, it used an
updated saveCount.

This scheme should be robust, since it doesn't depend on any clock
being accurate across the net.  It does depend on the fact that
AppleShare allows only one process to have a file open for read/write
at a time. Therefore, if I get the "write" file descriptor, I have a
guarantee that noone else has it.  

The original scheme, given in "How to write a TEXT editor, (part 1)"
is still my favorite, and this modification should only be used if it
is necessary. Peter, I hope you will report back soon whether it is
necessary or not.

--- David Phillip Oster            --My Good News: "I'm a perfectionist."
Arpa: oster@dewey.soe.berkeley.edu --My Bad News: "I don't charge by the hour."
Uucp: {uwvax,decvax,ihnp4}!ucbvax!oster%dewey.soe.berkeley.edu