Order To Chaos: Version Control and Transformations

If your backup and developing strategy involves creating lots of folder copies, manual comparison of two different versions and storing the final release candidate on a file server in a zip named such as “acme-invoices-v1.4_final.zip” while still not knowing if the productive system may include changes not stored within this zip, then this article is for you. Traditionally, version control is a component in software release management – but there is no reason why we as Kofax Developers can’t become more professional by utilising it. My primary goal in this article isn’t about convincing you to use a version control system such as Git after all, it aims at helping you making the most out of it it when used in combination with Kofax Transformations.

What Git can do for you

So Git, huh? Yes, I admit it – I am a huge fan of Git’s very concept being a distributed, server-less version control system. That means you can have your whole repository at your local computer as well as on a thumb drive. If you’re not sure what version control is and why you should use it, check out this article first. In a nutshell, this is what you get (ibid):

  1. Storing versions: every time you check in a new version of your project, the old one still persists. You don’t have to choose a name for the version, Git does that for you – and you can enter text that will help you describe what you’ve changed. You can easily track changes over time.
  2. Restoring older versions: you can always restore an older version. So, you broke the build with the changes you made? No worries, going back to an older version hardly takes more than a few clicks.
  3. Backup: when respecting the distributed aspect of Git, you can consider having a full backup of your code. There’s one repository on the clients’ machine, one on your local PC, and maybe another one on your thumb drive. If one of those devices dies, you’ll always have the full backup in the other locations.

It took me some time to get Git properly working with Transformations for reasons I’ll mention in a minute – and still, there are some open issues. Anyways, let’s start with an example – take a look at the following screenshot. Here, we’re about to commit changes to our project repository. At a glance we can see what has changed in the project file:

  • Two fields were added (note the green lines), namely “Name” and “Date”,
  • A format locator called “FL_Dates” was added,
  • The language-special comment for WWB.COM was added to the script.
So, you've added two fields, a locator, and a line in your script.
So, you’ve added two fields, a locator, and a line in your script.

What you need to do for Git

Again, this is not an article about Git itself – if you need more information, you may want to check out their excellent documentation. For the moment it’s sufficient for you to know that Git uses so-called repositories. A repository can be seen as a collection of files that you want to put under source control. In our case that includes all files created by the project builder – the fpr file and all other files in the same directory. Usually I take it one step further: I include images and text files (for dictionaries and databases) into the repository as well. Don’t worry about wasting too much storage space: Git only stores a file multiple times when it has changed (which usually won’t happen with images). Oh, and here’s the benefit of adding images and xdocs: no longer will you accidentally destroy Golden Files.

Here’s how I organise my projects: there is a root folder for my project called QuickSearch. This will become our git repository:

git-repository

The folder db contains text files as sources for dictionaries and databases, img contains all my test images, while ktm holds another folder with the whole project configuration:

ktm-contents

What you need to install

First, you’ll need Git for Windows. Then, there is another issue that needs to be addressed: while version control works fine with any type of file, comparing them works with textual files only. Kofax Transformations however uses lots of so-called binary files (i.e. any file that is not text, such as PDF documents or TIFF files). There are two (binary) files in particular that we would like to be able to compare against:

  1. fpr files contain the whole configuration from Project Builder
  2. xdc files are representations of images, words, fields, locators, and much more.

That’s what you get when you try comparing binary files:

git-binary
The old version and the new one are different. Great insights.

That’s not helpful at all. While you still have the benefits of storing versions and reverting to older ones, you cannot compare any changes that were made. Fortunately, Git allows us to define a specific way of handling any binary file. Knowing that both fprs and xdcs essentially are XML data in a gzip file, all we have to do is to setup a small shell script that unzips the file. There is one catch, though: Transformation xmls lack line feeds, so all the data is written in the first two lines. That would make comparing them visually almost impossible – that’s why we will be adding line feeds to the final XML. Here’s the bash script we’re about to use (don’t worry, Git ships with Linux tools allowing us to run that script even on Windows). I recommend putting the file to C:\Program Files (x86)\Git\bin and calling it ktmdiff.sh (just use any text editor to create and save the file):

#!/bin/sh
gunzip -c -S xdc $1 | sed 's/>/>\n/g'

Here’s the difference. Without adding line feeds, you’ll end up with a singular line that spans over 11.000 characters. It is impossible to derive any meaningful information:

unzipped-fpr-no-linefeed
Good like finding the first difference (it starts at column 1079)

With the added line feeds, comparing projects finally works:

A new field, a new locator.
A new field, a new locator.

Then, we need to make changes to two internal Git files: gitconfig and gitattributes. I will give a short explanation in a minute, don’t worry. Here’s what we will add to the gitconfig file which also resides in Git’s home director, usually C:\Program Files (x86)\Git\etc\gitconfig. Just add the following lines to the very bottom of the file.

[diff "ktm"]
binary = true
textconv = ktmdiff.sh

Now, we told Git that there are certain files that require special attention. Whenever Git is confronted with viewing or comparing (i.e. using diff) one of those binary files it will, from now on, use our shell script. We still did not tell Git which file types it needs to consider – that’s what gitattributes is for. Usually, each repository can have its own file – so we need to make sure to edit the global one, which usually resides in C:\Program Files (x86)\Git\etc\gitattributes). Here’s what we add to the very bottom:

*.xdc diff=ktm
*.XDC diff=ktm
*.fpr diff=ktm
*.FPR diff=ktm

That was the final link. Whenever Git encounters an xdc or fpr file, it will now handle them with the diff called ktm – the one we defined earlier on in the gitconfig file. The result? Well, have a look at the very first image presented in this post. Oh, and of course that works in the shell (command line), too:

gitdiff-cmd

And of course, you can compare xdoc files:

RecoStar and ABBYY FineReader will give you different results.
RecoStar and ABBYY FineReader will give you different results.

Conclusion

Using a version control system such as Git makes you even more professional. The ability to modify it to help you compare binary files, especially the fpr and xdoc format, is a real plus – to be fair, I don’t know how easy or difficult this is with our systems such as SVN. There’s one catch, still – usually, version control also allows you to merge changes made by multiple users, even to the very same file. Unfortunately, that won’t be possible with KTM for a very simple reason – every time the Project Builder saves the frp file (which is an XML, as you’ve learned), it will also generate a checksum that you’ll see at the very top of the file. As long as we don’t know the exact algorithm of creating that checksum, merging will remain a dream. And even if we can make it work, I’m not sure if we would break something at a certain point when doing the merge.

Anyways, here’s my recommendation: start using Git, and use it for your projects with Kofax Transformations. You’ll thank me once you accidentally deleted some code without having a proper backup; and you colleagues will thank you when they know exactly what you’ve changed in the latest version of your project file. Any experiences with Kofax and version control? Get in touch with me, I like to hear your opinions.