VRSing your files without a scanner in KTM

Usually, producing VRSed images requires either a document scanner, or some dirty tricks with a twain emulator. A few months ago I stumbled upon a very interesting method already built into Kofax Transformation Modules that allows us to binarize electronic documents! Well, while we always knew that it was there – when you use the advanced zone locator on a colour image, once you switch to the reference image you’ll see that there was some VRS magic – I was quite surprised that this method is public!

Let’s give it a shot. The method resides in the CscImgLib class and is called BinarizeWithVRS. It is called on a CcsImage object and returns that data type as well.

BinarizeWithVRS

First let’s start with some samples. I first downloaded datasheets for both KTM and Analytics, two native PDF images with lots of text, but also some images. Naturally, they can be imported and used in KTM – so, the CscImage does not really care whether you use a TIFF or a PDF, meaning that conversion will also work on all accepted formats. originals

So, those are pretty good examples. We’ve got blue on blue, diagrams, screenshots – VRS can shine on them. Here’s the result:

vrsed

I won’t go into too much detail, as you will know what VRS can do for you. Let’s dig into the more interesting part: how to call the method? First, I wrote my own BinarizeDocumentWithVRS method that takes a file (as String) and returns a collection (of CscCollection, so multiple pages are possible as well).


Private Function BinarizeDocumentWithVRS(file As String) As CscCollection

   Dim imgOriginal As CscCollection
   Dim imgResult As New CscCollection
   Dim tmpImage As New CscImage
   Dim binarizedImg As CscImage

   Dim i As Integer

   ' first: load all pages into a collection
   Set imgOriginal = LoadImageAllPages(file)


   For i = 0 To imgOriginal.Count - 1
      tmpImage.Load(file, i)

      Set binarizedImg = tmpImage.BinarizeWithVRS()
      'binarizedImg.VRS_Despeckle(1000,1000)
      binarizedImg.VRS_Filter(2)
      imgResult.Add(binarizedImg, i)

   Next

   ' now return the collection
   Return imgResult

End Function


Private Function LoadImageAllPages(file As String) As CscCollection

   Dim i As Integer
   Dim tmpColl As New CscCollection
   Dim tmpImg As New CscImage

   While True
      tmpImg.Load(file, i)
      On Error GoTo EndOfPage
      ' add page to collection
      tmpColl.Add(tmpImg, i)
      i = i + 1
   Wend

EndOfPage:

   Return tmpColl

End Function

Note that there are some properties that are – to my surprise – quite well documented in the scripting object reference! For example, VRS_Filter allows you to fine-tune some VRS settings:

  • 0 = char smooth / strong neighbor
  • 1 = thinning / erosion
  • 2 = thicken / dilation
  • 3 = smooth+clean / opening
  • 4 = fill line breaks / closing
  • 5 = smooth+clean+preserve / openplus
  • 6 = fill breaks + preserve / closeplus
  • 7 = light thicken / dilate2x2
  • 8 = outline

Secondly, the next step was to use that method. I decided to put it into the Batch_Open event; thus I can use it right out of Project Builder to generate binarized documents out of the currently selected document set!


Private Sub Batch_Open(ByVal pXRootFolder As CASCADELib.CscXFolder)

	Dim x As New CscCollection
	Dim i As Integer
	Dim outputPath As String
	Dim tmpxDoc As New CscXDocument

	' converted images will be saved here
	outputPath = "C:\TIFF\Converted\"

	Debug.Clear

	' loop over all xdocs
	For i = 0 To pXRootFolder.DocInfos.Count - 1
		' will work for 1-paged files only
		Debug.Print(pXRootFolder.DocInfos(i).XDocument.CDoc.SourceFiles(0).FileName + " --> " + outputPath & i & ".tif")

		Set x = BinarizeDocumentWithVRS(pXRootFolder.DocInfos(i).XDocument.CDoc.SourceFiles(0).FileName)
		' so we only get 1 page back then
		x(1).Save(outputPath & i & ".tif")
	Next

	Debug.Print("done!")

End Sub

There you go, we just turned KTM Project Builder into a virtual, VRS-powered scanner! (Sidenote: yes, I was lazy and did  take care of single page files only).