Monday, August 10, 2009

Adding binary files to CVS

I recently added a few Excel files to CVS and suddenly found that the files had become unreadable when I checked them out again. Excel would complain that the files were corrupted blah, blah, blah ....

The last time this had happened to me, I had been dealing with files that were over a decade old and that contained very important information and I had almost got a heart attack. At that time I had figured out the solution after a lot of tense web research but had omitted to document that since I assumed that I would remember it. Now I wish to correct that omission of mine.

The problem happens because CVS by default assumes that files are non-binary and does stuff like keyword substitution on non-binary files when you check them out. The solution is therefore simple. Just let CVS know that the files in question are binary. For example suppose you have a directory foo in your CVS repository that contains (say) some .xls files that you mistakenly checked in without marking them as binary. The all you need to do is:
  • cd foo
  • cvs admin -kb *.xls
  • cvs update
and you are all set.

To prevent this in future, go to the highest level in your repository and edit the file CVSROOT/cvswrappers just like you modify any other file in CVS. Modify it so it has the following lines at the end:

*.xls -k 'b'
*.xlsx -k 'b'
*.ppt -k 'b'
*.pptx -k 'b'
*.doc -k 'b'
*.pdf -k 'b'
*.docx -k 'b'

Then just commit that file and CVS will remember that all these files are binary files.

Note that this change will only apply to this particular repository. If you have multiple repositories you will need to modify this file in all your repositories, of course.