Scripting with Style

Let’s face it: even without the zero-code approach in KTA, you will reach a certain point where a script locator will do the job much quicker than anything else. In Kofax Transformations (formerly KTM), scripting always was and ever will be your best friend. And while facing projects with lots of code can be intimating at times, the real pain lies within bad scripting. That’s where scripting guidelines come into play: you can make your everybody’s live much easier when agreeing to follow certain rules. This post shows is about my rules, it’s about how I like to script, and as such my personal flavour will be around a lot. It’s absolutely OK do disagree with me, just make sure your team members follow your rules.

Use WWB-COM

With a single special comment line on top of your script, you’ll enable additional data types, instructions, conversion functions and operators. It’s the first thing you should do when opening a script in Project Builder – always add this line at the very top:


'#Language "WWB-COM"

Use the Return Statement

Following up the additional features you’ll get when using above statement, one of the most useful things is the Return instruction for functions. Here are two examples: the first one includes the Return instruction, the second one does not. I’ll leave it up to you which one is easier to read:


Public Function String_GetUserName() As String
   ' returns the current user, determined from the environmental variables
   Dim dom As String, usr As String
   usr = Environ("USERNAME")
   dom = Environ("USERDOMAIN")
   Return dom + "\" + usr

End Function

Public Function String_GetUserName() As String
   ' returns the current user, determined from the environmental variables
   Dim dom As String, usr As String
   usr = Environ("USERNAME")
   dom = Environ("USERDOMAIN")
   String_GetUserName = dom + "\" + usr

End Function

Use short-circuit evaluation

Short-circuit evaluation will make your code shorter, more efficient and easier to read at the same time. In many boolean operations, looking at the first parameter is sufficient. Think of the logical AND operation: the result can only be TRUE when all arguments are TRUE. By default the keywords And and Or won’t perform short-circuit evaluation, which means KTM will evaluate all arguments and eventually fail. AndAlso and OrElse will. Here’s an example that will result in a runtime exception if there are no alternatives:


' this will result in a runtime exception if alts does not contain any elements
If alts.Count > 0 And alts(0).Text = "Hello World" Then

Without short-circuit evaluation, you’ll need to check for existing alternatives first. Only then you can safely access the text property of the first element.


' this is the classical way to solve it without AndAlso
If alts.Count > 0 Then
If alts(0).Text = "Hello World" Then

Here’s the much shorter approach that also increases readability. In case there is no alternative, the first statement will never be reached during runtime – so, there’s no exception to be dealt with:


If alts.Count > 0 AndAlso alts(0).Text = "Hello World" Then

OrElse behaves similar – if the first operation already returns true, no further argument has to be evaluated. Likewise, if you have two functions where one is significantly more computationally intensive than the other, put it to the end of the logical queue.

Variables Naming Convention

Let’s talk about Simonyi‘s Hungarian Notation: it has a bad reputation. Even Microsoft officially abandoned it when they released their first versions of .NET. Now, is Hungarian Notation really that bad? Not at all – the real issue is what generations of programmers thought it was: simply prefixing your variables with the type. Does the following look familiar to you?


Dim iRowIndex as Long
Dim sFieldName as String

What’s the benefit of prefixing your variables with a lower case character, thus indicating their types? What kind of additional value and information comes with it when when reading the code? Well, breaking news: nothing at all. In almost all cases the compiler will warn you if the type is invalid or if an explicit type conversion is required. Was Simonyi wrong? No, he simply made a single mistake in his paper introducing the Hungarian Notation: he mistakenly used the word type, when he should have used kind. Here, look it up and replace “type” with “kind”, and suddenly it all makes much more sense. Let’s say that an integer stores the row index of a table – you may want to call it trwIdx instead. And imagine, there’s another variable for storing the index of ocred text lines (essentially another row index), it may be names lnIdx. Naturally, assigning one to the other would not make any sense, that’s where the Hungarian Notation comes in handy:


lnIdx = trwIdx + 1

Just reading it shows you that there’s something awkward going on – you’re assigning an index to another while they are not related at all. So, if you want to use Hungarian Notation, at least make sure you’re doing it right (more ranting about wrong usage about can be found here). By the way, programmers at Kofax also use Hungarian notation (correctly), here’s an example:


Private Sub Document_AfterClassifyXDoc(ByVal pXDoc As CASCADELib.CscXDocument)

Note that in the AfterClassifyXDoc sub, a pointer to the xDocument object is passed – hence the recommended prefix p (a pointer to X). Either stick with that if you’re into Hungarian Notation, but more importantly: please stop prefixing your variables with the type, and give them insightful names instead.

Use Longs over Integers

In WinWrap Basic, the Integer data type is 16-bits only. Plus they are usually signed, leaving us with a possible range from -32.768 to 32.767. While this might be sufficient for most cases, sometimes of is not. Imagine you loop over every single word of an OCR representation: assuming 500 words per page, that limit is reached after about 65 pages. In some scenarios, you’ll be dealing with that many pages, sometimes even more – resulting on an overflow during processing. Use the Long data type instead: Longs are 32-bit integers, giving you a range from -2,147,483,648 to 2,147,483,647 – more than sufficient for any use case.

Function Names in General

So, are you a fan of CamelCase, lowerCamelCase or snake_case? In my opinion, they all are fine. The most important thing is to be consistent, that’s it. If you want to blend in with the general style in Transformations like I do, here’s how you would name your functions:


Public Function String_PadLeft(s As String, width As Long, char As String) As String
   Return Right(String$(width, char) & s, width)
End Function

In the above sample a string object will be padded (appended) with certain characters, for example blanks. This is what brings us to..

Extension Methods

Sometimes the existing methods provided by Kofax aren’t sufficient. For example, let’s say you want to sort field alternatives based on their word indices on the document rather than the confidence: there is no available method on the field object that can do that. As WinWrap Basic does not allow the addition of real extension methods, here’s a possible naming scheme:
{Object Name}_{Method Name}

First comes the object, the the method name.  This will help you stay organised, allowing you to quickly find what you are looking for. Here are some basic examples of extension methods that follow the proposed scheme:


Public Sub Field_Sort(field As CscXDocField)
Public Sub Field_RemoveAlternatives(field As CscXDocField, minConf As Double)
Public Function String_RemovePunctuation(s As String) As String

That way you’ll end up with a neatly sorted list of available functions (subs and methods), for each class as shown in the Proc List in the upper right corner:

transformations-ide

 

If you want to go the extra mile you could also develop a taxonomy when it comes to naming your methods:

From or To?

Be consistent with the direction of your operations. For example, let’s say you want to copy a subfield into a field – there are basically two ways to do that:


Public Function Field_CopyFromSubfield

versus


Public Function Subfield_CopyToField

And while both are essentially correct, you may have a hard time looking for the right function when you did not name them in a consistent way. Either go for CopyTo or CopyFrom, implement them both, but don’t mix. Using a taxonomy like this will also help you stay organised – here are some examples:

  • Subs that modify an objects’ property: {Object}_SetTo{X}, such as Field_SetToText,
  • Functions that will return an object: {Object}_{ReturnedObject}{Description}, such as Table_GetWordsFromRange,
  • Functions that return a boolean: {Object}_Has{Subject}, such as Table_HasTotalRow.

Use inheritance

Scripts in Transformations embody the concept of inheritance: every Sub of Function with the correct accessor (i.e. Public) can be used in a specialised class, that is every class “below” in hierarchy starting from the current one. In the very same way, public Subs and Functions defined at project level are available to all classes. Don’t repeat yourself, create your common methods on the project level and reuse them wherever you need. Note that if you want auto complete to work properly in the specialised class, you’d have to “start” the parent script(s) first (F5).

Inheritance at its best
Inheritance at its best

The very same concept can be used for all events fired by Kofax. Imagine button clicks on the Validation Forms: instead of spreading your business logic around different classes, define a master Sub at a structural class (i.e. a root element that you create at the very top right after the project element). Now, in every class with a Validation Form, you’ll just call the master Sub instead of repeating yourself, just like that:

Calling a generic sub from an event
Calling a generic sub from an event

Prefer Existing Structures

Always prefer existing classes over custom ones. Do you need to store some strings? You could either use the CscCollection class or even store them in the CscAlternatives collection of a CscXDocField object. This encourages reuse of custom extension methods you already wrote, for example FieldAlternatives_CopyToField. Here’s an example where we use two CscCollection objects to store add and even pages:


Dim evenPages As New CscCollection
Dim oddPages As New CscCollection

Dim i As Long

For i = 0 To pXDoc.Pages.Count - 1
   If (i + 1) Mod 2 = 0 Then
      evenPages.Add(pXDoc.Pages(i), CStr(i))
   Else
      oddPages.Add(pXDoc.Pages(i), CStr(i))
   End If
Next

Use .NET Classes

Did you know you can use .NET classes in WinWrap? Well, only those which are COM-visible, but that includes most collections such as ArrayLists or the HashTables (Dictionaries). For example, you can use an array list to store and sort items:


Dim oList As Object
Set oList = CreateObject("System.Collections.ArrayList")

' add some items to the list
oList.Add("Banana")
oList.Add("Apple")
oList.Add("Peach")

' check if an item is in that list
Debug.Print oList.Contains("Beer")

' insert a new item to that list, at the 1st position
oList.Insert(0, "Beer")

' sort the list
oList.Sort()

You can also reuse your .NET assemblies in WinWrap, as long as they are COM-visible – and here’s how.

Type Conversions

Don’t use WinWraps conversions such as CDbl when a formatter can achieve the same thing and is much more resilient to conversion errors. For example, when you need to convert a string to double, here’s what you can do:


Public Function Double_TryParse(s As String, ByRef d As Double) As Boolean
   ' tries converting a string to double using the default amount formatter
   Dim f As New CscXDocField
   f.Text = s
   If Project.FieldFormatters.ItemByName("DoubleAmountFormatter").FormatField(f) Then
      d = f.DoubleValue
      Return True
   End If
   Return False
End Function

Prefer Script Variables over Constants

By definition, the value of a constants will never change. However, sometimes constants are bound to a specific environment, and have their value changed as the environment changes. Here’s an example: imagine that you want to store xdocs and images to the file system after the document was validated – naturally, you would set up a constant for the root path:


Const XDOCS_ROOTPATH As String = "\\serverx\xdocs"

However, if you consider different systems such as a test and productive environment, it is most likely that you want to separate the paths. While the productive system might point to server X, the test system should point to server Y. Here’s where script variables come into play. Ideally they are combined with configuration sets, as described in this article. Creating, maintaining and accessing them is easy:
script-variables


String xdocsRootPath = Project.ScriptVariables.ItemByName("XDOCS_ROOTPATH").Value

Another benefit of using script variables is that you may want some powerusers to be able to change them – think of a global confidence threshold. Instead of  having them opening the script, you can provide them with a GUI, which will be much easier to understand for most people. Plus, they can’t accidentally delete, change or add code and break the build.

Use Script Resources for Localisation

We’ve all been there – one of your Transformation projects is being rolled out to another country, and they speak a different language. Unfortunately, you’ve hard-coded custom error messages in script and are now forced to search and replace them.


pXDoc.Fields.ItemByName("Name").ErrorDescription = "Der Wert für dieses Feld ist ungültig!"

There is a better way: Script Resources. Transformations fully supports localisation – even in script.

script-resources


pXDoc.Fields.ItemByName("Name").ErrorDescription = Project.Resources.GetString("InvalidFieldValue")

External Modules

While WinWrap allows the use of external modules with the help of the Uses statement, I urge you to be cautious with them. In general, having working code in a central place makes sense, so this speaks for modules. However, the way the WinWrap Basic IDE deals with external code can be frustrating. Here’s what you need to be aware of:

  • You can run your code, but the external modules’ code remains hidden
  • When stepping into one of the external functions, the  IDE shows the code
  • Changes to the external code via the IDE are not saved (you explicitly need to do this in another editor), however
  • Changes appear to be saved – they remain in memory until you close the Project Builder.

The last fact is both confusing and can potentially cost you precious hours if you didn’t realise you were typing code into the wrong window. The IDE does not make it clear that you’re currently working in an external module unless you mouse-over the sheet tab, as shown below.

sheet-tab

I’d rather go for one of the following options when you want to logically group reusable subs and methods together:

  1. Put all reusable code on project level
  2. Have a structural class, and put all your code there
  3. Use a COM-visible .net dll that contains all your helpers

The advantage of using a .net dll is that, well, you can use the .net framework instead of VBS – the disadvantages are that debugging often is not possible when there’s no IDE on the customer’s system, and that you’ll have to ship said dll to the respective clients.

Implement proper Error Handling

There are some ways to deal with errors in WinWrap – and out of context you can’t say one way is better suited than the other. Let’s start with the standard use case in Transformations: usually, when there is an error you did not expect, you want it to be thrown (hard, if raising them is necessary). Whenever an error is thrown in production, the batch will go to Quality Control and somebody will deal with this issue. Let’s imagine you need to rely on the alternatives of a barcode locator, but there was no barcode sticker found on the document:


Dim srcLoc As CscXDocField
Set srcLoc = pXDoc.Locators.ItemByName("BCL")
If srcLoc.Alternatives.Count = 0 Then Err.Raise vbObjectError + 10, "KTM Class Script", "The barcode locator has no alternatives"

In the above sample the batch will go to QC, and the error message will be present in the KTM log files. Another way to deal with errors is to trap them, and either the On Error Resume Next or On Error Goto statement can do so. Note that you should only use these statements when you really can trap the error and provide a way out. Here’s an example: the following sub tries to push an item into an existing array (i.e. appending it to the end of the array, and at the same time extending the original array). If the original array was empty, ReDim would result in an error, as would checking for the upper bound – so it’s perfectly fine to catch the error here:


Public Sub Array_Push(arr() As Variant, item As Variant)
   ' pushes a new item to the end of an array
   ' what happens when the array is empty?
   On Error GoTo e
   ReDim Preserve arr(UBound(arr) + 1)
   arr(UBound(arr)) = item
   Exit Sub

e:
   ' the array was empty
   ReDim arr(0)
   arr(0) = item

End Sub

Again, when you have the opportunity of preventing an error from happening, do so. For example, when you try accessing a field check if it exists in the first place. If it does not but should, raise an error hard.

Refactor!

Did you notice something? This is the only heading with an exclamation mark, just because refactoring is life. Seriously, go over your code multiple times and refactor. Refactoring is the process of going over your code again, changing its structure without affecting the behaviour. The benefit from that is that your code will be much easier to maintain, and you’ll end up with new functions to be reused in follow-up projects.

 

(.. to be continued. Any ideas? Drop my a mail or contact me via LinkedIn.)