Fixing CVS annotate

gpanther — Wed, 07 Oct 2009 14:19:00 +0000

Yes, some of us work on projects started almost a decade ago and as such we use CVS (yes, CVS has many limitations and yes, git is better – for a nice introduction see Randal Schwarz’s video about git), but migrating is not directly justifiable (it would involve: training IT staff to be able to maintain the repo, rewriting automation code which relies on CVS and training programmers – even though some of these could be postponed given that git contains a CVS bridge). Anyway, the problem which I faced was the following: cvs annotate only displays the first 8 characters of the username, which can be ambiguous if multiple people have similar usernames (which can easily happen if there is a convention like name.surname). Here is my solution to the problem: fetch the log for the file get the user associated whit each version (in the log CVS includes the full usernames). Then fetch the annotated version of the file and use the version to disambiguate the user. Here is some Perl code:

sub processAnnotations {
  my $fileName = shift;
  my ($cmdLine, $pid, %revisions);

  $cmdLine = "cvs -z9 log -N '$fileName'";
  $pid = open F, "$cmdLine |";
  my $rev;
  while () {
    $rev = $1 if (/^revision ([0-9.]+)$/);
    $revisions{$rev} = $1 if (/^date:.*?author: (.*?);/);
  }
  close F;
  waitpid($pid, 0);

  $cmdLine = "cvs -z9 annotate '$fileName'";
  $pid = open F, "$cmdLine |";
  my @annFileLines;
  while () {
    if (/^(d[0-9.]+)(s+)(S+ (.*)/s && exists $revisions{$1}) {
      $_ = "$1$2(" . $revisions{$1} . " $3";
    }
    push @annFileLines, $_;
  }  
  close F; 
  waitpid($pid, 0);    
  
  return join('', @annFileLines);
}

PS. I verified in the CVS source that the output width for the author field is hardcoded:

		    sprintf (buf, "%-12s (%-8.8s ",
			     prvers->version,
			     prvers->author);

Picture taken from Valeriana Solaris’ photostream with permission.

Distributed version control systems – why?

gpanther — Fri, 11 Apr 2008 04:42:00 +0000

Some time ago I finally had time to read the Subversion book and felt that all my questions were answered. I tried SVN many years back and failed miserably, but now I’m confident in my ability to use, install and maintain SVN. However there seems to be a new buzz about distributed versioning systems (like darcs, http://www.selenic.com/mercurial, and so on), which for the longest time I didn’t get. It seemed to me that everything I need or could possibly need is in SVN. Then it hit me:

"Classical" version control systems like SVN are about keeping a central repository of "stuff" (code mostly, but it can be other things) enabling a large set of users to work on it concurrently and coordinating them to minimize friction but also ensure that they don’t step on each-others toes. The versioning part of the systems is a side-effect of these goals (meaning that versions are primarily there as an accounting mechanism – a "who did what" type of thing).

Distributed version control systems on the other hand put the emphasis exactly on that: keeping a very granular history. To put in a very oversimplified way: in my opinion if you have permanent connectivity to your SVN server (it’s on your local box for example) and you "commit early, commit often" you basically have most of the advantages if a DVCS. Or to put it otherwise:

If you’re using SVN, then somebody can go away, work on a change for days (weeks, months) and come back with a big patch which you apply and you’ll see in the log that at the given commit you’ve changed a thousand line of code for example. With a DVCS (if I understand correctly) you would merge not only the patch, but also the history of the patch, having a result similar to the branch-merge method (ie when the changes were gradually committed to a branch which got merged back into the trunk), but without the need to have constant access to the SVN.

In conclusion, currently I don’t have any great interest in DVCS both because I have (almost) permanent connectivity to my repository and I already "commit early / commit often", but this (as almost all the things) may change in the future :-).

Update: Hanselminutes (a great podcast for every developer) has just published an episode about Git, an other distributed version control system (this one is written by Linus Torvalds himself and is used to develop the Linux kernel). It contains some good discussions about Git from a Subversion user point of view.

version control – Grey Panthers Savannah

Fixing CVS annotate

Distributed version control systems – why?