Querying a fuzzy database in Kofax Transformations can be a hassle – if you are to follow the official script sample provided with KTM/KTA. For once, it’s overly complicated, outdated and not all parts of the official script work as intended. There’s a better version provided within Kofax’ Best Practice Guide for KTM that’s easier to comprehend and use, yet there still are issues with it:
- It does not work with huge databases (KSMS, larger than 4GB),
- the maximum number of results is being ignored,
- we can simplify things even more.
So, it is time for a revised version of Database_Search that will be much easier to use. Here’s our sample database:
And there are some sample queries made in script:
Dim dbName As String
Dim minConf As Double
Dim maxResults As Long
Dim results As CscXDocField
dbName = "db-customers"
minConf = .5
maxResults = 15
' search over all columns
Set results = Database_Search(dbName, minConf, maxResults, "claus gruber")
' search over two columns provided by id (note: 1 is the first column; 0 = all columns!)
Set results = Database_Search(dbName, minConf, maxResults, 1, "Gabi", 5, "Wien")
' same as above, but with providing column names (you should prefer this as column ids can change)
Set results = Database_Search(dbName, minConf, maxResults, "FirstName", "Gabi", "city", "Wien")
As you can see, a simple query over all columns still remains simple enough. Just type the words you want to search for as the last parameter, and you’re set. Even when searching multiple columns, Database_Search keeps things simple: First, you will need to provide the column, either by name or number – then, you provide the value to search for – in pairs. You can provide as many pairs as you need, the script automatically takes care of the rest.
So, without any further delays, here’s the revised Database_Search that can work with databases of any size, fixes the max results issue and is easier to use:
Public Function Database_Search(dbName As String, minConfidence As Double, maxResults As Long, ParamArray searchArray() As Variant) As CscXDocField
' Revision 3 of Database_Search:
' - fixes the problem with maxResults not working
' - works with 64bit (e.g. files in KSMS > 4GB)
' - simplified queries especially when searching in multiple columns
' search examples (searchArray value):
' "john doe" -> search for "john doe" in all columns
' 0, "test 123", 3, "foo" -> search for "test 123" in ALL columns, and for "foo" in column number 3 (= idx 2)
' "Surname", "Doe" -> searches for "Doe" in a column named "Surname"
Dim Fields() As String
Dim FieldIDs() As Long
Dim db As CscDatabase
Dim hits As CscDatabaseResItems
Dim i As Long
Dim h As Long
Dim fieldId As Long
Dim alt As CscXDocFieldAlternative
Dim results As New CscXDocField
' prepare QText and QField as required by db.Search (i.e. where to search and what to search for)
If UBound(searchArray) = 0 Then
' user wants to search in all columns
LongArray_Push(FieldIDs, 0)
StringArray_Push(Fields, searchArray(i))
Else
' assign strings to the respective columns
For i = 0 To UBound(searchArray)
If i Mod 2 = 0 Then ' column information
If TypeName(searchArray(i)) = "String" Then
' column idx 0 = search all columns, so we need to add 1
fieldId = Database_GetFuzzyIndex(dbName, searchArray(i))
Else
fieldId = searchArray(i)
End If
LongArray_Push(FieldIDs, fieldId)
Else ' column text
StringArray_Push(Fields, searchArray(i))
End If
Next
End If
' search the fuzzy db
Set db = Project.Databases.ItemByName(dbName)
Set hits = db.Search(Fields, FieldIDs, CscEvalMatchQuery, 100)
For i = 0 To hits.Count - 1
If hits(i).Score >= minConfidence Then
Set alt = results.Alternatives.Create
alt.Confidence = hits(i).Score
For h = 0 To db.FieldCount - 1
alt.SubFields.Create(db.FieldName(h))
alt.SubFields(h).Index = h
alt.SubFields(h).Text = db.GetRecordData64(hits(i).RecID64)(h)
alt.SubFields(h).Confidence = hits(i).Score
Next
End If
Next
' deal with max results (still broken in KTM 6.2)
If results.Alternatives.Count > maxResults Then
For i = results.Alternatives.Count - 1 To maxResults Step - 1
results.Alternatives.Remove(i)
Next
End If
Return results
End Function
Public Function Database_GetFuzzyIndex(dbName As String, columnName As String) As Long
' returns the field index for the column name provided
' returns -1 if the field (column) does not exist
' never returns 0 (= all columns)
' note: if a column is not checked in KTM/KSMS, it is not in the index
Dim colIdx As Long
Dim fuzIdx As Long
fuzIdx = 1
With Project.Databases.ItemByName(dbName)
For colIdx = 0 To .FieldCount - 1
' check if the field is in the index
If .FieldType(colIdx) = CscDatabaseFieldTypeDefault Then
If .FieldName(colIdx) = columnName Then Return fuzIdx
fuzIdx += 1
End If
Next
End With
Return -1
End Function
Public Sub StringArray_Push(arr() As String, item As Variant)
' no, you can't easily convert variants to strings.
On Error GoTo e
ReDim Preserve arr(UBound(arr) + 1)
arr(UBound(arr)) = item
Exit Sub
e:
' guess the array was empty
ReDim arr(0)
arr(0) = item
End Sub
Public Sub LongArray_Push(arr() As Long, item As Variant)
' no, you can't easily convert variants to longs.
On Error GoTo e
ReDim Preserve arr(UBound(arr) + 1)
arr(UBound(arr)) = item
Exit Sub
e:
' guess the array was empty
ReDim arr(0)
arr(0) = item
End Sub