Bossing around your attachments with KIC

Can KIC import attachments before the mail body? After it? Can I change their order? Can I create one document per attachment? Can I ditch attachments before importing them? Is it possible to add some? I keep hearing (and reading) questions like those over and over again, so I decided to write this post. Of course we’re focusing on KIC dealing with mails primarily, where plenty of attachments may be present.

The basics

This section provides some insights into what mails really are. If you’re just looking for the scripts, feel free to skip ahead, I won’t tell. Basically, emails are simple text files. Well, we have a body and attachments, but in the end everything really is text. Even your attachments, everything will be text in the end. Here’s what you may see in your email client:

email

That’s what really is being sent over the wire:

email-text

The first paragraph contains vital information for routing the document, the so-called message header. Here’s the originator, the date, the subject, and much more information later made available in KIC. Now, for the body and the attachments. Normally, all attachments are “converted” into text and sent as such, a method called base64 encoding. That way, binary data such as images, movies, and also pdf files is represented by text. Text that was removed in the screenshot above for maintaining readability. The order of attachments is maintained in the email. You can see that the first attachment carries the name “NumberedPages_page_0001.pdf”, which was the first file I attached. What you’ll see as well is that there are additional attachments, such as the text “Here you go …” sent in html format. This is helpful when one wants to maintain formatted text; however, there’s always the plain text in the section above.

KIC then receives the message, utilizing and exposing its properties. The following screenshot shows the same message as received by KIC, properties window. We can clearly see that all five pdf attachments are there (application/pdf), and there’s one multipart/alternative attachment, which is the html formatted text (“Here you go …”). We might want to get rid of this text, later on.

kic-email-properties

One document per attachment

This is the easiest task, but you wouldn’t believe how many people manage overlooking it. Me included. The checkbox is within the destination configuration, second tab. In our case and without any further manipulation, checking it would create 6 different documents in the batch related to my initial email. Unchecking it would create one huge document, merging content together.kic-oneDocPerAttachment

Body before or after attachments?

Again, this is a simple setting. Contrary to some belief, there is neither scripting involved, nor a separate custom module or workflow agent is required. It’s just a harmless dropdown sitting at the first tab:

content

The settings are pretty much self-explanatory. You can also combine these settings with the create document switch, covering different use cases. Here are some examples:combinations

Modifying attachments with a document script

Now, what if you need to ignore certain attachments during import? For example, some mail clients may attach the body again, as text/html, and other times you find yourself importing jpg or png images of the company logo or social network sites. This is best dealt with the aid of a document script, to be linked to the destination on the third tab:

document-script

The best thing to start with such a script is to either use the KIC developer’s manual, or copy and paste one of the samples shipped with KIC – to be found at C:\Program Files (x86)\Kofax\KIC-ED\KCPlugIn\ScriptSamples. The ManageMessageFiles method seems quite suitable for all operations related to manipulating content, as it is being called before any content is imported into Capture. Let’s have a closer look at this method:


public void ManageMessageFiles(ReadonlyMessage message,
					   List messageBody,
					   List attachments,
					   object extension)
{
	// your code goes here
}

This method exposes all you want. We will focus on the list of attachments it provides – note that this parameter can be modified, for it’s not read-only. The Attachment class is described in greater detail in the documentation, in our case it’s interesting to mention that this class offers a lot of interesting properties such as OriginalFileName (you guessed it) and the Content – which is the original file content in binary form. Before manipulating that list directly, I like to create a second list that I call filteredAttachments. Each attachment I want to retain goes to that list, and I determine whether I’d like to keep one while looping over all attachments. At the end, I clear the original list, and set it to my filtered list, something like this:


public void ManageMessageFiles(ReadonlyMessage message,
					   List messageBody,
					   List attachments,
					   object extension)
{

	// this list will hold all attachments we want to keep
	List filteredAttachments = new List();
	
	// first, loop over all original attachments
	foreach (var a in attachments)
	{
		// do some checks and add the attachment, if you like
		filteredAttachments.Add(a);
	}

	// now we end up with a second list of filtered attachments. we can make some modifications here, and even add new attachments.
	
	// we can also re-order the list of attachments
	filteredAttachments.Reverse();

	// finally, we clear the original list of attachments and reconstruct it
	attachments.Clear();
	foreach (var a in filteredAttachments)
	{
		attachments.Add(a);
	}

	if (attachments.Count == 0)
	{
		// no attachment present - we can decide to ignore the message:
		throw new ScriptIgnoreMessageException("No more attachments (aka 'Well, that escalated quickly')");
	}
	
}

In the above example we simply keep all files, but let’s be a little bold here and remove some attachments we don’t like, for example anything which is not a pdf file. All we change are some lines when filtering the attachments:


	// first, loop over all original attachments
	foreach (var a in attachments)
	{
		// filtering everything not-pdfy
		if (Path.GetExtension(a.OriginalFileName) == ".pdf")
		{
			filteredAttachments.Add(a);
		}
	}

Sure, that isn’t perfect as it is case sensitive, and checking for the extension only might not be perfect, but I think you get the point. The same can be done if you want to add attachments: just create a new instance of the Attachment class, set the properties required and – most important – the Content, and you’re ready to go.

There is more..

So, with the script provided above, many possibilities just opened up. Think about changing the attachment’s order. Think about adding attachments, doing file manipulations such as inserting error sheets when conversion will fail, think of merging the pages and saving the attachments to disk BEFORE manipulating them. What is it you want to achieve with KIC?