Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!sun-barr!cs.utexas.edu!csd4.milw.wisc.edu!uxc.cso.uiuc.edu!uxc.cso.uiuc.edu!m.cs.uiuc.edu!render From: render@m.cs.uiuc.edu Newsgroups: comp.software-eng Subject: Re: Source Code Control Message-ID: <39400028@m.cs.uiuc.edu> Date: 18 Jun 89 21:29:00 GMT References: <133@tirnan.UUCP> Lines: 91 Nf-ID: #R:tirnan.UUCP:133:m.cs.uiuc.edu:39400028:000:5605 Nf-From: m.cs.uiuc.edu!render Jun 18 16:29:00 1989 Sorry if it seems I'm talking too much, but no one else from the SCM research people seems to be jumping in. Onward ... Written 9:35 am Jun 16, 1989 by runyan@hpirs.HP.COM: [Several good points about the problems of distributed SCM and getting developers to accept SCM restriction.] From what I've read, Shape does not specifically address the problems of distributed SCM except to say that you can use it with NFS but will run into some troubles. Just to make things clear, I am in no way connected with the Shape people and have only just gotten their software. I haven't even tried it out much, so my knowledge of it is limited to reading one of their papers and looking at the release docs a bit. To me it seems that the problems of distributed SCM are similar to the problems of distributed databases, i.e. distributed data and concurrent updates. I know of non-DB SCM systems which attempt to solve this, among them Apollo's DSEE (which unfortunately requires Apollo hardware and OS support to work) and DVSS, a distributed version server for CAD applications. The groups that build their SCM systems around a database (representing development modules as data items and putting version control into the DBMS) can presumably use distributed DB techniques to solve the problem. Surprisingly, this has not seemed to be a popular research topic, but it may just be that I haven't seen the right papers. Of the approaches to distribution that I've read about, the popular one is the idea of putting all your controlled modules in one place (a server) and accessing them only using controlled checkin/checkout commands. Depending on the method/system you use, you can exclusively lock modules for update, presumably preventing overlapping modifications. The problem is that this write-locking hoses you if you want multiple mods done simultaneously by different developers. It hoses you even further if the system prevents any access to a module undergoing update. My opinion is that the system should allow the users to specify the locking strategy to use, either one checkout at a time or as a system-wide policy. RCS gives you some flexibility, the prototype SCM system I built gives you a little more, and I don't doubt other systems do as well. Another problem with distribution is speed of access. This can be solved by caching versions and by establishing "domains of control" that localize the modules and versions needed by a particular group on a particular machine. The identification of the modules then becomes a function of the work breakdown for a project; when a project is laid out, the modules a particular group will be responsible for can be located on their machine(s). This would not only speed up access, it will encourage encapsulation of the code they develop. The problems you mention with branching versions is somewhat new to me. I don't know of any current SCM systems that don't allow branching versions, unless they have been developed by shops where such things are no-nos. If there is a degradation in performance in working on branches, it may be due to the storage of the branching versions. Most source code versions are stored as deltas, lists of the textual differences between one version and it's successor or predecessor. To checkout a version, the deltas must be applied to some "base" version that is stored in its entirety. In RCS, this version is the most recent "trunk" version, the one that is on the main line of descent from the initial version of the module. To checkout a branch version, RCS must first apply all the differences between the base and the version at which the branch begins and then apply the deltas along the branch until the desired one is recreated. This can take time, and there hasn't really been an improved version of this scheme devised. Off the top of my head, I guess you could store heavily used versions in their entirety regardless of whether they are on a branch or not. This would take up more space but would reduce the time it takes for checkin/checkout. From a research point of view, I think someone should put together a SCM system and test it in the field using different storage schemes, mixing deltas, full copies and compressed copies to see the time/space behavior and the user response. You may even be able to do such a study based purely on update frequency and the growth characteristics of module version trees in a particular shop. This would show you things like how often revisions are generated, how many branches are made, and how often old versions are accessed. I know of a little work like this done years ago with SCCS, but nothing recently. Despite the fact that a good SCM could greatly benefit both developers and managers, selling SCM and SCM systems to people seems to be a major chore. Part of the problem is that it imposes constraints on developers. I don't know if you can get around this anymore than you can get around the fact that using a strongly-typed programming language will prevent you from doing some things. Fortunately, if the system is well designed, the things that it prevents you from doing are things that you shouldn't be doing anyway. For example, one programmer should not be modifying the same version as another without knowing what the other is doing. Unfortunately, there are still a lot of SCM things we don't know how to support cleanly and painlessly. Those of us doing SCM research are trying to remedy this, but it's slow going and not outrageously popular. Hal Render render@cs.uiuc.edu