Using Collections in KTM

Collections are a great thing. They allow you to organize your objects in a quick and easy way, adding and deleting as you please. The only shame is: WinWrap Basic does not add any support for collections out of the box. However, there is a simple way: KTM includes contains the CscCollection class that you can and should use to organize your objects. CscCollections are organized just like a dictionary: there is a unique id, i.e. the key, and the name, i.e. the value.

Here’s a simple example: our dictionary will hold fields, locators and all words of the xDoc. Then, we’ll access the fields by the key and store them into a fresh field object:


   Dim c As New CscCollection
   c.Add(pXDoc.Fields, "fields")
   c.Add(pXDoc.Locators, "locs")
   c.Add(pXDoc.Words, "words")

   Dim f As CscXDocFields
   Set f = c.ItemByName("fields")

A more complex example: let’s assume the following use case – you want to create a histogram for the words present on your document.
Conventional approaches may not include arrays that you’d constantly redim. With collections, your task is way easier. Note that we’re going to use multiple dictionaries however. One will contain all unique words (the words text as key, to be precise), and for the value, we’ll have a new collection containing all word themselves as objects.

Here’s the pseudo code:

For each word, check if the collection already holds its text (i.e. the key).
If the key was found, add the word to the value.
If the key was not found, create a new key and a new collection as value. Then, add the word to that new collection.

Sounds confusing? This graphic explains it:

The gray are represents a collection of unique words. The key is the text of that word, for example “and”, “or”, “there”. For the values, there are n additional collections, each holding as many instances of the respective word found. For example, if the document would contain seven “ands”, there would be seven words inside the first blue collection.

Here’s the code (I use a script locator here):


Private Sub SL_Histogram_LocateAlternatives(ByVal pXDoc As CASCADELib.CscXDocument, ByVal pLocator As CASCADELib.CscXDocField)

   Dim histogram As New CscCollection
   Dim wordCount As Long, histogramCount As Long

   For wordCount = 0 To pXDoc.Words.Count - 1
      'For Each word, check If the collection already holds its Text (i.e. the key).
      If histogram.ItemNameExists(pXDoc.Words(wordCount).Text) Then
          'If the key was found, add the word To the value.
          histogram.ItemByName(pXDoc.Words(wordCount).Text).Add(pXDoc.Words(wordCount), CStr(wordCount))

      Else
         'If the key was Not found, create a new key And a new collection As value. Then, add the word To that new collection.
         Dim c As CscCollection
         Set c = New CscCollection
         c.Add(pXDoc.Words(wordCount), CStr(wordCount))
         histogram.Add(c, pXDoc.Words(wordCount).Text)
      End If

   Next

   OutputHistogram(histogram)

End Sub

Here’s the helper sub that outputs the histogram to the debug view:


Private Sub OutputHistogram(c As CscCollection)

   Debug.Clear
   Dim i As Long

   For i = 1 To c.Count
      Debug.Print c.ItemName(i) + ";" +  c(i).Count
   Next

End Sub

Now, we can tell the following:
How many unique words are present on the document? A simple count provides us with that information.
Furthermore, by counting each word’s words (Inception, anyone?), we’ll finally have our histogram!

The above image shows the top 13 words by count of the KTA Whitepaper. By the way:
Kudos to David Wright for lending me his brain when writing this algorithm 😉

Quipu Blog

Kofax Subject Matter Experts

Using Collections in KTM