Resources

How OCR can miss quality cases and why A.I. can do better

Written by Matt Francis | Nov 3, 2024 11:11:02 PM

When it comes to extracting critical medical data from dense medical record PDFs, many mass tort lawyers believe that OCR (optical character recognition) is the "state of the art." And indeed, OCR has been around for decades and is widely used for digitizing paper documents and converting scanned images into editable text.

However, OCR has its limitations. OCR simply recognizes words and numbers without understanding the context of the query, the document, or even the litigation requirements. For example, if you search for the term "lymphoma" using OCR, you will get all the documents that contain the word "lymphoma" and will have to manually search if those results are the specific type of lymphoma (e.g. b-cell, hodgkins vs non-hodgkins, etc) that pertains to the injury identified in that litigation. Similarly, if you search for the commercial drug name using OCR, you will get all the documents that contain the exact spelling of the name, but not the documents that mention the drug’s medical name, or shorthand versions of the drug name.

This is where AI-based contextual search comes in. Advances in AI technology have made it possible to create software that is far more powerful than OCR, as it can search for complex synonyms of various diseases or drug names, it can determine the type of document that is being analyzed to provide better search results, and it can provide analysis on the kind of data that is being extracted.

Let's take a closer look at some of the key differences between OCR and AI-based contextual search.

Accuracy

OCR is known to have a high error rate, especially when it comes to recognizing handwritten or low-quality text. This can result in too many false positives when finding a specific term. AI-based contextual search, on the other hand, uses advanced algorithms and machine learning techniques to improve accuracy and reduce errors. The software can recognize patterns and context that OCR cannot, which leads to more accurate search results and data extraction.

Contextual Understanding

OCR simply recognizes words and numbers, without any understanding of the context in which they appear. AI-based contextual search, however, has the ability to understand the meaning of the words and phrases in the document and the query, and can provide search results that are relevant to the specific context of the search. For example, if the software is trained to search for all injuries related to a toxic tort litigation, like Camp Lejeune, you would be able to find all the relevant injuries at once, instead of searching for them individually.

Synonym Recognition

OCR can only recognize words and numbers that are spelled out in the document, which means that it may miss documents that use synonyms or abbreviations for certain terms. AI-based contextual search, however, can recognize synonyms and abbreviations and provide search results that include all relevant documents. For example, if you search for the drug name "ibuprofen" using AI-based contextual search, the software will be able to recognize documents that mention the drug using the abbreviation "ibupr" or a synonym like "Advil," and may recognize misspellings.

Document Analysis

OCR treats all documents the same, regardless of their type or format. AI-based contextual search, however, can analyze the type of document that is being searched and provide more targeted search results. For example, AI-based software, like Pattern Data, can determine the type of document, such as an office visit, prescription or pathological or operative report. Additionally, if the document was a pathology report, it can determine the diagnosis and disease.