Monday, August 10, 2009

Adding binary files to CVS

I recently added a few Excel files to CVS and suddenly found that the files had become unreadable when I checked them out again. Excel would complain that the files were corrupted blah, blah, blah ....

The last time this had happened to me, I had been dealing with files that were over a decade old and that contained very important information and I had almost got a heart attack. At that time I had figured out the solution after a lot of tense web research but had omitted to document that since I assumed that I would remember it. Now I wish to correct that omission of mine.

The problem happens because CVS by default assumes that files are non-binary and does stuff like keyword substitution on non-binary files when you check them out. The solution is therefore simple. Just let CVS know that the files in question are binary. For example suppose you have a directory foo in your CVS repository that contains (say) some .xls files that you mistakenly checked in without marking them as binary. The all you need to do is:
  • cd foo
  • cvs admin -kb *.xls
  • cvs update
and you are all set.

To prevent this in future, go to the highest level in your repository and edit the file CVSROOT/cvswrappers just like you modify any other file in CVS. Modify it so it has the following lines at the end:

*.xls -k 'b'
*.xlsx -k 'b'
*.ppt -k 'b'
*.pptx -k 'b'
*.doc -k 'b'
*.pdf -k 'b'
*.docx -k 'b'

Then just commit that file and CVS will remember that all these files are binary files.

Note that this change will only apply to this particular repository. If you have multiple repositories you will need to modify this file in all your repositories, of course.

Friday, April 10, 2009

Shuffling a table in Excel

Here is a problem I faced recently - I had an Excel table that I wanted shuffled randomly (no, I was not dealing a bridge hand as people who know about my association with the game might conclude). As a matter of routine I was going to do it the way I knew how to do these things - by copying this into a text file and writing a Perl script to do the shuffling but I just decided to see how one could do it in-place in Excel itself. The answer is very simple and elegant! Just create a pseudo column in the table and set the value for each cell in that column to "=RAND()" and sort on this column! Excel is cool - I want to learn all about it.

Saturday, January 24, 2009

Sometimes the old ways are the best ...

I woke up this sleepy Saturday morning and decided to check out what was happening on Twitter. My network on Twitter consists of one friend Dhananjay Nene who twitters regularly (http://twitter.com/dnene) coupled with many others who have done nothing other than twittering "Test" or "Checking this out" and then tuning out completely. Anyway, to come back to the point, I saw this entry by Dhananjay

"Brian Kernighan "Sometimes the old Ways are Best" - http://tinyurl.com/c2ydgc - Precisely the reason why I cant stop using Linux.
  • P1 would consist of one line : getfacl d >oldacls
  • P2 would consist of one line : setfacl -f oldacls d
In Windows on the other hand I had the following problems:
  • Windows has at least three programs that manage file ACLs, not all of which work on all versions of Windows - cacls, xcacls, icacls
  • Except for icacls that is only supported on Vista and later, none of the programs provide a convenient way for storing current ACLs.
  • cacls is available on all Windows platforms post-2000 (which was o.k. for me) but not all cacls had a "/S" switch that prints current ACLs
  • In settings where I can use "cacls /S" to get the current ACLs I still need to parse the output which looks something like c:\dir1\dir2\currentdir: "a long ACL string". Unfortunately parsing this string and extracting the ACLs for d1 is not at all trivial in Windows, unless of course we were using UNIX tools such as grep or perl or awk.
Unfortunately for me, using any of the UNIX tools was not an option so I had to end up writing over 100 lines of C# code! Can you believe it?