Is 2023 the Year for Intelligent Document Processing (IDP)?
It’s estimated that 95% of corporate information exists on paper and/or scanned documents. Most agreements, contracts, invoices, reports, etc. that are digitized either through scanning or creation (i.e., MS Word, Excel, etc.) are built to reflect their respective version as if they were typed on physical paper. There is so much data and information stored on printed, scanned, and fully digitized documents. Imagine for a minute, in an ideal world, a company was able to meaningfully extract every document, catalog and search for it within a few clicks? In addition, if these companies were able to create machine learning models to help with decisioning and provide a comprehensive historical analysis of everything related to the company. This productivity gain could change the way businesses work and hire. A fully digitized company with this profile is nonexistent at this point but it could be more commonplace in the next 10 years. The main obstacle has always been the reliability of the technology. As I wrote in a previous blog, it’s my belief that IDP will be one of the top use cases in 2023 and beyond.
So, what has changed with this technology that makes it so powerful now?
I think it's worth going through a little bit of history to fully understand where we are now and how exciting it is. Optical Character Recognition (OCR) has been around since the early 1930’s when it was originally called a “Statistical Machine” for searching microfilm archives for characters. The approach has helped visually impaired people and assistive technologies to read documents. HP created a widely used approach which involved dictionaries for common phrases and words called Tesseract. Google has since adopted it as part of their approach to reading characters via documents and applications. Today, with the use of highly sophisticated AI, OCR in the cloud, extracting text from documents is as close to 100% reliable as it has ever been before. This includes reading typed text and even handwriting (within reason).
The reading of characters and extracting words are certainly great benchmarks but only solve part of the use case most businesses have. Whether it be scanned documents or screens, traditional positional OCR is very brittle. Structure is important for many of the leading OCR engines (i.e., ABBYY, Amazon Textract and UiPath Document Understanding.)