Newsletter Template Images for Hubspot (1600 × 900 px)-Jan-24-2023-12-29-25-3948-AM

March 2023  Vol. 32

Out like a lion! Up here in New England, the weather is changing to spring with weekly nor'easters and snow squalls but that's not the only thing that's changing this season. CampTek Software is growing faster than you can re-set your clocks. We've adopted and partnered with Indico, the best document reading technology to date. We've reached new heights in data migration launching massive projects to move information from a legacy system for a large Midwest hospital. We're exploring new possibilities with UiPath Task Mining and so much more! Not to mention our team continues to grow with the addition of Joyce Zhang, Senior Intelligent Automation Program Manager, Melissa Theriault, Senior Account Executive, and Stephen Devereaux, Account Executive.

We're looking forward to even more changes this spring!

-The CampTek Team


Newsletter Template Images for Hubspot (1600 × 900 px)-Jan-24-2023-12-37-45-4647-AM

Welcome to CAMP a weekly podcast by CampTek Software where we'll cover all things automation and robotic process automation (RPA). You'll hear from various members of the CampTek team on topics such as Citizen Development, How to Prepare for a Scope Call, What RPA Can Mean for You, and so much more! Check out our first two episodes on YouTube, Linkedin, or Transistor!

Hear from our CTO & Founder, Peter Camp, and VP of Operations and Customer Success, Amy Wooldridge, on part one of their two-part series on key words and phrases you should know when starting an RPA initiative.


Newsletter Template Images for Hubspot (1600 × 900 px) (1)

Is 2023 the Year for Intelligent Document Processing (IDP)?

It’s estimated that 95% of corporate information exists on paper and/or scanned documents.  Most agreements, contracts, invoices, reports, etc. that are digitized either through scanning or creation (i.e., MS Word, Excel, etc.) are built to reflect their respective version as if they were typed on physical paper.  There is so much data and information stored on printed, scanned, and fully digitized documents. Imagine for a minute, in an ideal world, a company was able to meaningfully extract every document, catalog and search for it within a few clicks?  In addition, if these companies were able to create machine learning models to help with decisioning and provide a comprehensive historical analysis of everything related to the company. This productivity gain could change the way businesses work and hire.  A fully digitized company with this profile is nonexistent at this point but it could be more commonplace in the next 10 years.  The main obstacle has always been the reliability of the technology.  As I wrote in a previous blog, it’s my belief that IDP will be one of the top use cases in 2023 and beyond.

So, what has changed with this technology that makes it so powerful now?

I think it's worth going through a little bit of history to fully understand where we are now and how exciting it is. Optical Character Recognition (OCR) has been around since the early 1930’s when it was originally called a “Statistical Machine” for searching microfilm archives for characters. The approach has helped visually impaired people and assistive technologies to read documents.  HP created a widely used approach which involved dictionaries for common phrases and words called Tesseract.  Google has since adopted it as part of their approach to reading characters via documents and applications.  Today, with the use of highly sophisticated AI, OCR in the cloud, extracting text from documents is as close to 100% reliable as it has ever been before.  This includes reading typed text and even handwriting (within reason).

The reading of characters and extracting words are certainly great benchmarks but only solve part of the use case most businesses have.  Whether it be scanned documents or screens, traditional positional OCR is very brittle.  Structure is important for many of the leading OCR engines (i.e., ABBYY, Amazon Textract and UiPath Document Understanding.)

 

It's important to understand the differences between Structured, Semi Structured and Unstructured Data.   

Structured data is a very predictable format where there is minimal, if no variance, as to where data is located.  A good example might be an intake form at a doctor's office or customs form one has to fill out in order to enter countries.  So, data values like name, date, nationality are always in the same spot and the answers are consistently next to those items.  This type of scanning requires OCR and evaluating the key value pair for each answer, i.e., Name: John Smith.  Name will always have the name next to it.  Extraction is high positional since these items don’t move.

That can’t be said for semi structured data.  While some of the data is predictable, frequently there is some variance.  Invoices, receipts, and purchase orders typically fall in this category. The challenge that comes with these types of documents are the variances in each of the document types.  An example could be an invoice from Staples.  One could order 1 item or 100 items and the positional nature is lost because of where the end of the invoice varies from order to order.  It can get even more complex when trying to process, not just Staples invoices, but all of the other companies that are sending invoices.  One company may have Invoice Number as “Invoice #” another might have it as “Invoice No.”. This category of structured data is fertile with solutions from various document understanding companies and continues to improve enormously.  UiPath for instance, has one of the top Document Understanding products in this very competitive space.

Finally, my favorite category of the bunch and the biggest area for opportunity in 2023 and beyond, Unstructured Data.  There is essentially no predictability as to where data is located on one document to the next.  Contracts, legal documents, doctors' orders, handwritten documents and insurance claims all fall in this category.  Since nothing is positional, the strategy is to create commonality on the type of data you are trying to extract, or else trigger phrases are the way to train these types of models.  Since OCR engines, at this point, are very sophisticated and can read almost anything, the challenge is to find a solution that can perform this type of task.  Indico Data is the top vendor in this space at the moment.

It's my thought that the unstructured data space is where most of the 95% of corporate information exists and where the long-term value add is.  The first step will be to capture this data with close to 100% reliability and then store the data and create machine learning models out of them which can help with AI decisioning.  Beyond that, having sufficient search capabilities can only provide companies with more readily available and useful data.

-Peter Camp, CTO & Founder

Social Template-2

Newsletter Template Images for Hubspot (1600 × 900 px) (2)

Legacy System Data Migration Case Study

Real RPA Case Studies, Real Verifiable Solutions

 
Industries: Healthcare (applicable to all industries)
Systems: CPSI, OnBase, Epic

Challenge:

A very large mid-west healthcare provider approached CampTek Software to help identify areas within the organization to save money.

During an enterprise analysis, it was determined that migrating patient legacy data to a new system can save enormous amounts of money in the long run by eliminating the need to maintain that legacy system in perpetuity, as well as reduce the cost with greater efficiency.

Benefits:

Compliance with new healthcare regulations: Data migration can help comply with new healthcare regulations, such as:

  • HIPAA Privacy Rule
  • Single source for all Patient data
  • No need for HL7, FHIR or API connections or maintenance
  • No legacy software support or maintenance

Solution:

CampTek Software was able to access and open 10 million files and convert 4 million files and 100,000 encounters from CPSI to EPIC/OnBase. The whole process from migration to hosting and ongoing support is managed by CampTek Software’s experienced managed services team. The estimated manual process takes an FTE roughly a minute for each record. The automation time per file is about half of that at 30 seconds. With no human errors. If an FTE were to complete all this work, the total time working 24/7 would be 5,208 Days or 125K Hours. The FTE gained capacity is $3,125,000.

In addition, CPSI, the legacy system can now be sunset and no further ongoing costs will be incurred by the provider.