Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!sun-barr!cs.utexas.edu!csd4.milw.wisc.edu!uxc.cso.uiuc.edu!uxc.cso.uiuc.edu!m.cs.uiuc.edu!render
From: render@m.cs.uiuc.edu
Newsgroups: comp.software-eng
Subject: Re: Source Code Control
Message-ID: <39400028@m.cs.uiuc.edu>
Date: 18 Jun 89 21:29:00 GMT
References: <133@tirnan.UUCP>
Lines: 91
Nf-ID: #R:tirnan.UUCP:133:m.cs.uiuc.edu:39400028:000:5605
Nf-From: m.cs.uiuc.edu!render    Jun 18 16:29:00 1989


Sorry if it seems I'm talking too much, but no one else from the SCM
research people seems to be jumping in.  Onward ...

Written  9:35 am  Jun 16, 1989 by runyan@hpirs.HP.COM:
    [Several good points about the problems of distributed SCM and getting
    developers to accept SCM restriction.]

From what I've read, Shape does not specifically address the problems of
distributed SCM except to say that you can use it with NFS but will run
into some troubles.  Just to make things clear, I am in no way connected
with the Shape people and have only just gotten their software.  I haven't
even tried it out much, so my knowledge of it is limited to reading one
of their papers and looking at the release docs a bit.  

To me it seems that the problems of distributed SCM are similar to the 
problems of distributed databases, i.e. distributed data and concurrent
updates.  I know of non-DB SCM systems which attempt to solve this, among
them Apollo's DSEE (which unfortunately requires Apollo hardware and OS 
support to work) and DVSS, a distributed version server for CAD applications.
The groups that build their SCM systems around a database (representing 
development modules as data items and putting version control into the DBMS)
can presumably use distributed DB techniques to solve the problem.  
Surprisingly, this has not seemed to be a popular research topic, but it
may just be that I haven't seen the right papers.

Of the approaches to distribution that I've read about, the popular one is 
the idea of putting all your controlled modules in one place (a server) 
and accessing them only using controlled checkin/checkout commands.  Depending 
on the method/system you use, you can exclusively lock modules for update, 
presumably preventing overlapping modifications.  The problem is that this
write-locking hoses you if you want multiple mods done simultaneously by 
different developers.  It hoses you even further if the system prevents any 
access to a module undergoing update.  My opinion is that the system should 
allow the users to specify the locking strategy to use, either one checkout 
at a time or as a system-wide policy.  RCS gives you some flexibility, the
prototype SCM system I built gives you a little more, and I don't doubt other 
systems do as well.

Another problem with distribution is speed of access.  This can be solved
by caching versions and by establishing "domains of control" that localize
the modules and versions needed by a particular group on a particular 
machine.  The identification of the modules then becomes a function of 
the work breakdown for a project; when a project is laid out, the modules
a particular group will be responsible for can be located on their machine(s).
This would not only speed up access, it will encourage encapsulation of
the code they develop.

The problems you mention with branching versions is somewhat new to me.
I don't know of any current SCM systems that don't allow branching versions,
unless they have been developed by shops where such things are no-nos.
If there is a degradation in performance in working on branches, it may
be due to the storage of the branching versions.  Most source code versions
are stored as deltas, lists of the textual differences between one version
and it's successor or predecessor.  To checkout a version, the deltas must
be applied to some "base" version that is stored in its entirety.  In 
RCS, this version is the most recent "trunk" version, the one that is
on the main line of descent from the initial version of the module.  
To checkout a branch version, RCS must first apply all the differences
between the base and the version at which the branch begins and then 
apply the deltas along the branch until the desired one is recreated.
This can take time, and there hasn't really been an improved version of 
this scheme devised.  Off the top of my head, I guess you could store
heavily used versions in their entirety regardless of whether they
are on a branch or not.  This would take up more space but would reduce
the time it takes for checkin/checkout.  From a research point of view,
I think someone should put together a SCM system and test it in the
field using different storage schemes, mixing deltas, full copies and
compressed copies to see the time/space behavior and the user response.
You may even be able to do such a study based purely on update frequency
and the growth characteristics of module version trees in a particular
shop.  This would show you things like how often revisions are generated,
how many branches are made, and how often old versions are accessed.  
I know of a little work like this done years ago with SCCS, but nothing
recently.

Despite the fact that a good SCM could greatly benefit both developers and
managers, selling SCM and SCM systems to people seems to be a major chore.
Part of the problem is that it imposes constraints on developers.  I don't
know if you can get around this anymore than you can get around the fact
that using a strongly-typed programming language will prevent you from 
doing some things.  Fortunately, if the system is well designed, the things
that it prevents you from doing are things that you shouldn't be doing 
anyway.  For example, one programmer should not be modifying the same version 
as another without knowing what the other is doing.  Unfortunately, there are 
still a lot of SCM things we don't know how to support cleanly and painlessly.  
Those of us doing SCM research are trying to remedy this, but it's slow going 
and not outrageously popular.

Hal Render
render@cs.uiuc.edu