How the Healthcare Documents You Use Affect the Potential for Automation

In virtually every industry, critical processes depend on access to information and data contained in documents, and this is no different in healthcare. Even with significant advancements in EHR/EMR and interoperability, the use of documents continues to grow due to a variety of factors, including the ease with which documents can be used and the difficulty of converting documents to other data formats. Many automation solutions elect to forego any transition at all, preferring to have clients begin without the use of documents on day one. The reality is that documents are widely used and shared due to their ease of use.

Even so, the organizations that comprise any ecosystem are not required to abandon paper to reap the benefits of automation. Instead, implementing intelligent document processing (IDP) software can pave the way for both short- and long-term digital transformation, complementing current and future plans to implement new types of automation via machine learning.

One critical factor in determining the potential for automation in the healthcare industry is understanding the types of documents used in key areas. Claims submission and adjudication, payment reconciliation, and even audits all involve using similar but distinct documents, the characteristics of which can significantly impact the path you take. Consider the following three common types of documents:

The Structured Form

Structured forms are used extensively in healthcare, from benefit enrollment forms to claim submission forms. However, they are among the most misunderstood. Structured forms are defined by the way data is organized (a.k.a. “fields”). For example, a CMS-1500 form always contains the patient’s information, such as name, address, and birthdate, in the same location and format. Such standardization simplifies the process of inspecting and entering data into a medical claims system; it also simplifies the process of configuring an IDP system to know where to look. You may hear the term “template” used to refer to the layout and configuration of software.

Because structured forms are highly standardized, their automation potential is arguably the greatest of the three document types, as there is less room for error when locating information. However, there is a hidden challenge, particularly in healthcare: poor quality. While it is possible to have highly standardized forms, it is quite another to standardize scanning quality, particularly when documents are shared via fax machines. The expected location of critical data is abruptly shifted upward, downward, right, or left. Additionally, unwanted noise can muddle otherwise legible data on the original. So even though structured forms are the simplest and most automatable, there are still challenges to overcome.

Semi-structured Documents

The semi-structured document is the next document type and level of difficulty. As an illustration, consider the health remittance (a.k.a. EOB or EOP). This document is similar to structured forms in that required data is labeled to make it easier for staff to locate and enter data. If your organization only deals with a few payers, each payer’s remittance data is always in the same location. What truly differentiates the two is the number of possible variations on a remittance document. Because no two payers use the same format, it is more difficult and time-consuming for data entry staff to locate each required field. While data such as the covered amount may always be available, it may be stored in a different location for each payer, requiring even an experienced data entry clerk to spend time hunting.

Additionally, the more payers a provider organization deals with, the greater the variance. The more difficult it is for us humans to locate data, the more difficult it is for IDP software to locate data as well. The majority of software uses techniques such as keyword anchors to provide “clues” as to the location of the data. However, because there is more variation in both data labels and the location of data relative to the label and on the page, the potential for error is increased.

Unstructured Documents

The third document type is the most difficult to automate, and few processes have achieved a high degree of automation with these types. Progress notes, provider-payer contracts, and lab reports are examples of documents containing a significant amount of unstructured data. Unstructured data lacks labels and other cues that enable staff or software to locate the required information quickly. A collection of symptoms or a diagnosis can be buried within paragraphs of text, requiring staff to skim each page extensively. To automate the process of locating information within unstructured documents, a separate set of techniques based on grammatical hints is needed – a collection of capabilities frequently referred to as natural language processing or NLP. Even with lower levels of automation, significant efficiencies can be realized simply by assisting employees in performing their jobs more efficiently. If the exact data cannot be located, the most likely page containing the data can be displayed, or multiple options within the document can be presented, reducing the user’s work to approving what the system finds.

The Promise of Machine Learning

Historically, most automation projects utilizing IDP software have been focused on processes that heavily rely on structured forms or a small number of semi-structured documents. This was mainly due to the relative complexity of configuring and optimizing a system to accurately deliver large amounts of data. Unstructured documents were scarce within an automation project. However, with the addition of machine learning to the IDP world, the difficulties associated with manually configuring and optimizing a system become significantly easier as the system configures itself using training data sets. And once implemented, the system continues to evaluate data and adjust accordingly. Automated image enhancement benefits even structured forms with a high degree of variance due to low-quality scans.

As a result, obtaining automation benefits for all three document types is becoming easier. Still, the nature of each document within your project and its relative difficulty will continue to influence automation potential. You just won’t need to put in hundreds of hours to reap the benefits.

To learn more, visit