Kapow vs. UiPath, Part III: Desktop Automation

This is the third part of my Kapow vs UiPath series (you can find part 1 and part 2 here). Last time I analyzed how Kapow and UiPath perform when it comes to Web Automation – and while many applications are web-based nowadays, automating Desktop Applications is another important aspect of RPA in general.

Desktop Automation

Introduction

What if you could say good-bye to most of the tedious tasks or your daily routine? You know, that kind of work where your boss essentially has you enter data from application to another. That kind that requires you to go to three different applications only to export and consolidate data in an Excel that’s being sent to your team once every week? Yes, that kind of work that you never really wanted to do in the first place since it’s tedious, prone to error, and far below your capabilities: enter Desktop Automation.

RPA isn’t about automating interactions that happen hundreds, if not thousand of times a day  – think of automated orders in the automotive industry: most likely, two ERP systems will communicate via interfaces such as EDI automatically, often without involving a knowledge worker. So, what is the ideal task for RPA? Every tasks that occurs often enough to become a nuisance is a potential candidate – in essence, these tasks are important – but there’s always something more urgent. This is why our IT department never seems to have time to write that specific interface or integrate your application.

Eisenhower Decision Matrix

While the Eisenhower Decision Matrix is often used when it comes to time management, the very same principle applies here. Take urgent and important tasks, for example – this is where your company immediately acts and spends money on. Non-urgent yet important tasks are often neglected when it comes to automation (since your team is busy building and maintaining urgent and important tasks, such as our EDI integration of two ERP systems). This is why we need RPA – all those tasks that happen frequently are ideal candidates. And it’s not just automating web applications that make solutions such as Kapow and UiPath great – Desktop Automation will allow you to automate anything – that runs on a Windows Client, at least.

General Approach

As with Web Automation, automating Desktop applications isn’t exactly a new concept – for example, Windows 3.1 already was shipped with its own macro recorder. Just doing a quick Google search for Macro Recorders returns more than a million hits – obviously, there is a need for automation. But what’s the difference between a simple recorder and enterprise solutions such as Kapow and UiPath? Well, start using a macro recorder, and you will most likely run into one of the following issues (this list is far from being exhaustive):

  • The application has to be started first.
  • Your application has to be topmost.
  • Changing the screen resolution breaks all your macros.
  • A later version of the software you were automating broke all robots.
  • Moving the application window to a different spot renders your macros inoperable.

This is exactly were Kapow and UiPath come into play. Instead of moving the mouse to pre-defined coordinates, clicking there or letting a software emulate keypress events without any real context, the general approach is to render everything on your desktop into a representation easily accessible by a computer. This includes every single application, their windows, and all the controls contained within. The following screenshot shows the concept in action, with Kapow on the left-hand and UiPath on the right-hand side. Both Kapow and UiPath create an XML representation of the Windows application, here: IrfanView, with the Next File in Directory button being highlighted. This approach is far more superior to simple macros: all of a sudden, your automation software can see the desktop and interact with almost any element.

In order to create this representation, Kapow uses an application which needs to run on a dedicated client (called the Device Automation Service). UiPath is different, as UiStudio always uses the very same computer you are currently working on – however, UiPath can orchestrate multiple clients in a similar way Kapow does with its Management Console and RoboServer.

In addition to just representing and showing the application, both Kapow and UiPath have tools at their disposal to help you build robots. Kapow’s approach is similar to Web Automation: you just click the relevant Windows, indicate an element, and then right-click to interact with it. The following shows how to add an action that clicks the Next File in Directory button:

 

UiPath offers a dedicated Desktop Recorder module that allows interacting with any application, combining actions for you:

Conclusion & Verdict

There are some minor differences, but the general approach is the same: everything on a client’s desktop is made available as a hierarchical tree, rendered as XML. This includes each individual application and its controls along with a multitude of attributes – but more about that in the next section.

Both platforms allow you to quickly create steps with the aid of a visual editor – and while a macro recorder usually is great for selling a product, you can get meaningful results quickly with UiPath’s Desktop Recorder. It is even capable of updating flawed selectors automatically, but then again – more about selectors later.

At this point, there is no difference big enough to call one product way ahead of the other, so we start this round with a draw. (Note: if automating just a few application is what you want, then UiPath produces much less overhead – just run it on one machine, and you’re set. Kapow always needs a dedicated server plus a client – but since enterprise RPA is about automating mor than just a few tasks, I didn’t factor that into my decision.)

Interacting with the Representation

As we covered Web Automation last time this section was about traversing the Document Object Model, i.e. the way a web page is represented as a hierarchical way and then parsed accordingly. There’s not much difference when it comes to desktop automation – both Kapow and UiPath use a similar approach to parsing their own XML structure.

The following screenshots show Kapow and UiPath in action – what you see here are the two steps that were generated automatically as we interacted with IfranView in either platform.

I do like how Kapow presents everything quite neatly on the same screen – as always, there’s the HTML/XML at the very bottom, and the currently selected item is highlighted. Note that the component contained within the Left Click action points to an application with the name i_view32.exe (with the title in the attributes, but more about that in a minute). The requested element within IrfanView is a button with an attribute called name that contains the text “Next file in directory”.

Same use case, different solution – finders are now called selectors (which Kofax renamed to component in one of the latest releases which adds a lot of confusion since components now always include a component, which for example is a button). Back to UiPath, the selection editor shows how the Next file in directory button is being selected as a target for the Click action.

Another thing that both Kapow and UiPath: components can be and by default are reused. Note how the Left Click action in Kapow essentially contains three elements: one that identifies the object (the component, or as called earlier the finder). Another one to move the mouse towards said object, and finally the action that performs a left click. The most notable thing things are the Component and Application properties – note how they say “previous”, which essentially means that the first finder – or component – is being reused (which, as you can remember, is a button in IrfanView called “Next file in directory”).

UiPath does the same thing with its scopes: the outer container called Attach Window defines the scope for each action placed therein. As with Kapow you can overwrite this in case you ever need to.

Conclusion & Verdict

What I like about both Kapow and UiPath is that Desktop Automation works in a very similar way to Web Automation. The syntax to selecting elements is exactly the same (due to the fact that both essentially transform the whole desktop into XML). However, there’s another important aspect to consider – user experience. UiPath does a formidable job: the editor is clean, you can easily break your process into sequences and just display said sequence – and while you can achieve a similar thing with grouping in Kapow, it just never feels the same. UiPath gets the credo of Information Visualization right – Overview First, Details on Demand.

Have a look at two screenshots below – this robot loops over files found in a given directory, opens the image in IrfanView, and then extracts the invoice number via OCR. Would you have been able to learn that from just looking at the workflow?

The first screenshot doesn’t contain the loop (as this is part of the binary tree, and not an individual device automation step), and in case you don’t vigorously rename each step you will be lost after a couple of weeks have passed. UiPath creates screenshots for you and has a much better way of presenting the workflow in general. As such, UiPath is clearly the winner here.

(And don’t even get me started on the Device Automation Workflow Editor in Kofax resetting to its default layout every time you close it, even though it teases you with dockable windows – which could have been great when working with multiple monitors, or on a widescreen).

Scraping Applications

As we covered the basics, the next question is: how can you scrape data from Windows applications? Well, we already covered a portion of that when we talked about the representation – remember when we used IrfanView: a button’s text was exposed as part of the this representation (as an XML attribute). The example also covered how we could interact with images when we read the invoice number. Enter Optical Character Recognition –  this becomes vitally important when working with Citrix and RDP.

Here’s another use case: scraping data organized in a table. The Programs and Features window contains a list of all programs installed on your system, and each item in that list contains additional information such as the publisher or the version. Note how Kapow exposes everything just like as if it was a web page – every item has child nodes, each one representing one single cell. Kapow will also get you all items that are off-screen and expose said property back to us: isOffScreen=”true”.

As with Kapow, UiPath doesn’t try to hide anything from you. In fact, you can go into the UiExplorer and will see a very similar representation of the Programs and Features table:

What’s different is that UiPath offers a wide range of tools to scrape data from the screen as shown in the scraping wizard below:

However, not all items were present – everything off-screen wasn’t captured by the wizard and requires you to manually change selectors, and sometimes larger bits and pieces of the sequence created for you.

Conclusion & Verdict

UiPath has a more elegant way of scraping data thanks to its ability to automatically parse anything organized as a table (and even data organized as comma-separated value). In addition, the dataframe makes it much easier to data organized as an array (which a table ultimately is). Kapow does not have anything that comes close – there are no collections or arrays that you could use.

If all you need is to capture data and not interact with individual items, then you could just store the relevant XML containing all child nodes in a variable, and then loop each individual item again to get each row and ultimately, each cell’s value. Technically, this would qualify as a collection and while interacting with XMLs is incredibly easy in Kapow, you still have to do a lot of work manually (storing the XML in a variable, looping over child nodes, assigning values or attributes to individual variables, storing it in a database table). As said – UiPath allows you to do the same thing, but in less time.

Kapow has its strengths here, too – for example, it exposed all items – even those that were off-screen – without any manual intervention. And while scraping 1-dimensional data such as one individual textbox with the order number is important too, but UiPath’s native ability to scrape and store tables justify a narrow victory here.

Interacting with Applications

Now that we know more about the representation and scraping, which other do you have to interact with applications in general? As with Web Automation, Kapow has a very limited set of actions at your disposal and leaves it up to you to combine them. In fact, the following screenshot shows all of them (leaving out the different tree modes):

 

If you need something else, a combination of two or more actions can be used (think back of the Do-While loop we covered in part I).

UiPath follows a different mindset in offering you specialized activities for, well, whatever you can think of (or at least what they think you could think of). Just have a look at the UI Automation section in their Guide, and you will get the idea. Let’s say you need to wait for a certain Window to appear on the screen: UiPath has you covered with the On Element Appear action:

The same can be achieved in Kapow by using the Guard action:

But then again, the Guard action can be used to detect changes in the tree, wait for elements to appear or vanish, or just wait for a fixed amount of time.

Conclusion & Verdict

UiPath is a much easier to grasp – especially for beginners. Contrary to what some folks at Kofax say, I would never let my knowledge workers build their own robots with Desktop Automation – there’s just too much that isn’t obvious, and there are some quirks you need to know about (it’s a different story for Web Automation, though).

This article is not just about features, but about how you as an RPA engineer are supported by each individual platform. Kapow’s idea of offering a pre-configured set of actions which you may combine to your liking works quite well with Web and General Automation, but it falls flat for Desktop Automation – when a single click action essentially translates into three actions – a guard, a mouse move, and a click; which then again all have their individual components (finders for the old-school Kapow users). And while Kapow inserts all these steps for you, the resulting process map is simply ugly, not very intuitive, requires scrolling, collapsing, expanding, and re-naming even simple actions a lot. I can only recommend getting an ultra-widescreen monitor if Desktop Automation is relevant – but honestly, I would rather use UiPath most of the time.

UiPath on the other hand already offers a wide variety of actions at your disposal, setting the scope of an individual application or a component is perfectly clear, and knowing what your robots do often requires as little as taking a look at the screenshots it inserted for you. As such, UiPath is the clear winner when it comes to interacting with applications.

Citrix and the Remote Desktop Protocol

(Please note: when I started with this series, I expected each article to take me about a week. Well, working on this one took me almost three weeks, and due to my lack of experience with Citrix/RDP together with Kapow and UiPath I decided to exclude this section for the time being.)

Debugging and Desktop Automation

Last but not least – let’s talk about the debugger once more. Debugging Desktop Automation works different in Kapow – naturally, you cannot retain the current state of the application. While its debugger still shines with everything but Desktop Automation (we’ll cover that in the upcoming posts), here the debugger isn’t much different from UiPath. You can set breakpoints, watch variables, log messages, and more.

Conclusion & Verdict

Since the retention of the current state isn’t possible with Desktop Applications, Kapow has lots its advantage over UiPath’s debugger. As such, we have another draw.

Conclusion and Outlook

While Kapow was second to none when we covered Web Automation, this time UiPath’s strengths became evident. Sure, there are some things that Kapow does better – for example, showing the GUI along with the representation in the same window (you’ll need both the UiExplorer and Studio in UiPath to achieve the same thing). However, solely looking at the Desktop Automation aspect of RPA, most things are done (much) better in UiPath. Here’s my summary:

  • General Approach: Neutral
  • Interacting with the Representation: UiPath++
  • Scraping Applications: UiPath+
  • Interacting with Applications: UiPath++
  • Debugging and Desktop Automation: Neutral

Coming up next: Working with Files and Integration: Web Services and Emails (which I think will end up together in one article). I cannot promise the post will be there next week, but I can promise that I won’t give up on this series – so, stay tuned.

(Please note: there were some new features added in the latest Service Packs for Kapow, with a new way the visual tree of the desktop being the most prominent one. According to Kofax, this feature uses machine learning for identifying applications and components therein, and it’s specifically aimed at Citrix and RDP automation. It’s not included here, mainly due to the amount of time I can allocate in writing this block, but also because I said I would compare Kapow 10.3.0.0 with UiPath 2018.1.3 Community Edition when starting this series. It’s not fair to include features for just one of the products, so I’d rather try wrapping that up in the last article of this series.)