Jonas Ranstam

Statistical consultant

Version control

The development of statistical programs, scripts for performing statistical computations, is usually a process that includes troubleshooting, debugging and a great number of corrections and program changes. Additional analysis requests and further analyses are not exceptional during the evaluation phase of a project and when writing a report. This work often starts with a functioning program and leads to several new program versions, and sometimes also a loss of earlier ones. The more program files that exist in a directory, the more difficult it is to keep track of the work.

This problem can be substantially reduced and time saved by using a version control system, which keeps copies of the different program versions. When a program is changed and it turns out that the change was no good, an earlier version can then easily be recovered without filling the directory with different versions of the same programs.

Several version control systems exist, e.g. Git, Mercurial, Subversion and Bazaar. Many of them are free, and any of them can be used for statistical programming, but Bazaar seems to be the simplest to use and best suited for statistical programming.

This is a brief example of how to use Bazaar from the command line for version control of the files in a directory.

First, initiate version control of a project:

cd  project
bzr init project

Then add the files, i.e. tell Bazaar to track changes in the files:

bzr add .

A snapshot of the versioned files (a commit) can then be made:

bzr commit -m "This is an initial commit"

To undo the last commit:

bzr uncommit

To see what changes have been made since the last commit:

bzr status

To display the changes between the revised files:

bzr diff

To revert to a specified earlier version, N:

bzr revert -rN

Bazaar can be run on Windows, Mac and Linux. Graphical user interfaces can be downloaded, and more detailed information is available at http://doc.bazaar.canonical.com.