KTM Validation: The Missing Search Function

Searching for text is an essential feature you will find in almost any application: your favourite web browser, your word processor, and your text editor. Even your CMS allows you to search for documents or text within them – so why is this feature missing in KTM Validation? Well, as long as your locators play along and find the information you were looking for or you’re facing just one or two-paged documents, you should be fine. But then, consider the following use case: you’re processing loan contracts. The typical contract consists out of 80 pages, and you use a trainable group locator and possibly the text content locator to identify essential information such as the borrower, the agent, and the total amount lent.

What if the locators fail to find the borrower? Will you browse for their address and names, page by page? You see, a search function can become quite handy sometimes. Fear no longer, here’s how you could implement one. As always, here’s the teaser showing you how it could look like:

search-in-ktm

In the example above, I searched for the text “server scheduler”. The search results are displayed in a table, along with the page number. Clicking on a row will take us directly to the page where the text was found. Here’s how it works.

Finding words

Finding words in an xDocument is rather easy. A collection of CscXDocWords is present in the representation, for example pXDoc.Representations.ItemByName(“PDFTEXT”), given you’re provided PDF files with text. The following function returns a list of Words objects – that’s right, Words – as one could search for multiple words as shown in the screenshot above (i.e. “server” followed by “scheduler”).


Private Function Words_Search(pXDoc As CscXDocument, searchString As String) As Object

   ' searches a given xdoc for the search string, returns all words found
   Dim i As Long, h As Long
   Dim oWordsFound As Object
   Dim searchStrings() As String
   Dim phraseMatched As Boolean
   Dim finalWord As CscXDocWord

   Set oWordsFound = CreateObject("System.Collections.ArrayList")

   'it's entirely possible the user enters multiple words, e.g. "john doe" - the simple solution is to search for those two words in conjunction
   searchStrings = Split(searchString)
   ' when there are more words, we need to make sure not to get out of range
   For i = 0 To pXDoc.Representations(0).Words.Count  - (UBound(searchStrings) + 1)
      ' feel free to use levenshtein, case-insensitive, etc.
      phraseMatched = False
      ' see if the first word matches
      If UCase(pXDoc.Words(i).Text) = UCase(searchStrings(0)) Then
         phraseMatched = True
         ' now see if the subsequent words match, as well (if there are any)
         For h = 1 To UBound(searchStrings)
            If UCase(pXDoc.Words(i + h).Text) <> UCase(searchStrings(h)) Then
               phraseMatched = False
            End If
         Next
      End If
      ' finally, add the phrase if found
      If phraseMatched Then
         oWordsFound.Add(New CscXDocWords)
         For h = 0 To UBound(searchStrings)
            oWordsFound(oWordsFound.Count - 1).Append(pXDoc.Words(i+h))
         Next h
      End If
   Next
   Return oWordsFound

End Function

The table

After considering other alternatives a table seemed perfectly suited for displaying the search results. For example, it provides one with immediate feedback when clicking on a search alternative, and all matches are highlighted on the document viewer. I also tried storing the search alternative in a temporary variable, allowing the user to switch back and forth between the results (something similar to the search function you’re familiar with in your browser), but the major issue here was that a fields’ coordinates are only refreshed when one would leave and then re-enter the search field associated with that variable. That’s rubbish. So, I added a table model consisting of two columns: one for the search text (called “Text”), and one for the page where the text was found at (called “p.”). Then, I wrote a sub that would populate a table field called “SearchResults”:


Private Sub Table_Populate(pXDoc As CscXDocument)

      Dim tblField As CscXDocField
      Dim i As Long, h As Long

      Set tblField = pXDoc.Fields.ItemByName("SearchResults")

      ' first, clear the table
      For i = tblField.Table.Rows.Count - 1 To 0 Step -1
         tblField.Table.Rows.Remove(i)
      Next

      tblField.Table.QuickCreate(2, oSearchWords.Count)
      ' set the column names (the names MUST match the names from the table model, otherwise the table won't be propagated)
      tblField.Table.Columns(0).Name = "Text"
      tblField.Table.Columns(1).Name = "p."

      For i = 0 To oSearchWords.Count - 1
         ' there are multiple words possible
         For h = 0 To oSearchWords(i).Count - 1
            tblField.Table.Rows(i).Cells(0).AddWordData(oSearchWords(i)(h))
            tblField.Table.Rows(i).Cells(1).Text = CStr(oSearchWords(i)(0).PageIndex + 1)
         Next
      Next

End Sub

The Search Function

Finally, we can put all the puzzle pieces together: a separate field acts as the search field. Whenever its content is confirmed (i.e. the user hit enter), the search would be issued, populating the table as – long as there are some alternatives, of course. A global variable holds all the alternatives found.


Dim oSearchWords As Object

Private Sub ValidationForm_AfterFieldConfirmed(ByVal pXDoc As CASCADELib.CscXDocument, ByVal pField As CASCADELib.CscXDocField)

   If pField.Name = "Search" Then
      ' the search field's value was changed, so perform the search and reposition the search index (to the first word found, if there is any)
      'searchIndex = -1
      Set oSearchWords = CreateObject("System.Collections.ArrayList")

      Set oSearchWords = Words_Search(pXDoc, pField.Text)
      If oSearchWords.Count > 0 Then
         ' indicate the search did not return anything by colouring the field label
         ValidationForm.FieldLabels(0).SetForeColor(0, 180, 0)
      Else
         ValidationForm.FieldLabels(0).SetForeColor(180, 0, 0)
      End If
      ' then, list all items into a table
      ValidationForm.Tables(0).Visible = False
      Table_Populate(pXDoc)
      ValidationForm.Tables(0).Visible = True

   End If

End Sub

The “Get Me Started Project”

Is available here, enjoy.