In the previous chapter, we covered how AWS AI services can be used to set up a chatbot with your document workflows using Amazon Lex and Amazon Kendra. In this chapter, we will talk about how Amazon Textract and Amazon Comprehend Medical can help digitize medical claims in healthcare. We will talk about the healthcare industry's claims processing system and why it's important to automate medical claims. Then, we will walk you through how you can use Amazon Textract to digitize these claims in paper form and use postprocessing to validate them. Then, we will show you how you can extract NLP insights from these claims, such as whether the person was diabetic or not, using Amazon Comprehend Medical APIs.
For invalid claims, we will show you how to easily set up notifications to notify the person submitting the claims to resubmit it with the right data, such as ZIP code or claim ID. Lastly, we will show you some architecture patterns that help automate everything using AWS Lambda functions. By doing so, you will spin up an end-to-end serverless solution that will reduce time to market for the claims processing workflow. This happens because you do not have to set up and manage servers or scale to process millions of such claims.
We will cover the following topics in this chapter:
For this chapter, you will need access to an AWS account:
The Python code and sample datasets for our solution can be found at https://github.com/PacktPublishing/Natural-Language-Processing-with-AWS-AI-Services/tree/main/Chapter%2012. Please use the instructions in the following sections, along with the code in the preceding repository, to build the solution.
Check out the following video to see the Code in Action at https://bit.ly/3GrClSK.
In the healthcare industry, there were approximately 6.1 billion medical claims submitted in 2018 according to the 2018 CAHQ index report (https://www.caqh.org/sites/default/files/explorations/index/report/2018-index-report.pdf), and this number is expected to continue rising in the upcoming years.
Healthcare payer companies are constantly looking for efficient and cost-effective ways to process such volumes of claims in a scalable manner. With the current manual process of claim processing, it takes too much time to process these claims. So, healthcare companies are looking at AI and ML approaches to automating and digitizing these claims. Once they can digitize these, it becomes really easy to drive insights such as improving the population's overall health. Moreover, analyzing these claim documents might help you identify behaviors that can help prevent a medical condition from being developed. Also, healthcare payers are looking for a solution that is also compliant, such as HIPAA-compliant. For those of you outside the US, HIPAA is a healthcare-specific compliance law for the healthcare industry in the US.
So, we now understand why automating claims is so important. Now, we will talk about how we can help you automate this pipeline using AWS AI services such as Amazon Textract to digitize the claim process. You can do this by extracting text from these scanned claim documents and verifying them using NLP, along with Amazon Comprehend Medical, to get some patient health insights from these claims.
In this use case, our fictitious company, LiveRight Holdings Private Limited, is working with a healthcare insurance provider known as AwakenLife to process the claims that have been submitted by their insurance holders. These claims are mostly scanned images and most of their time and effort is spent on processing these claims since some of them are invalid. This leads to a loss to the organization. Since LiveRight has already been using Amazon Textract to automate, digitize, and further innovate their document processing workflows in the preceding chapters, they have recommended using AwakenLife so that they can use some of these AWS AI services to improve and automate their overall claims process. In this chapter, we will set up a simple AI-based workflow to validate the claims for AwakenLife, which can further reduce their overall processing time.
This solution is highly cost-effective and scalable as these services are serverless and scale to process documents based on your need. In this chapter, we will walk you through the following architecture:
In the preceding diagram, we can see the following:
We will walk through the previous architecture using a Jupyter notebook. Once we've done this, we will cover an architecture on how to make this implementation automated using event-based lambda functions.
In the next section, we will talk about how you can use Amazon Textract to extract data from medical intake forms.
In this section, we will show you how to use Amazon Textract to extract key-value pairs or form data from a medical intake form. Then, using simple logic, you will verify whether the extracted values are valid or invalid.
If you have not done so in the previous chapters, you will have to create an Amazon SageMaker Jupyter notebook and set up Identity and Access Management (IAM) permissions for that notebook role. By doing so, you will be able to access the AWS services we will use in this notebook. After that, you will need to clone this book's GitHub repository (https://github.com/PacktPublishing/Natural-Language-Processing-with-AWS-AI-Services), go to the Chapter 12 folder, and open the ch 12 automating claims processing.ipynb notebook.
Note:
Make sure that the IAM role in the notebook has AmazonSNSFullAccess, AmazonComprehendMedicalFullAccess, and AmazonTextractFullAccess.
Now, using this notebook, we will learn how to extract data using Textract APIs and validate them using some postprocessing logic via a sample medical intake form:
documentName = "validmedicalform.png"
display(Image(filename=documentName))
You will see that the following medical intake form has been loaded:
Now, we will extract the medical intake form's data by calling the Amazon Textract Analyze Document API with the Form feature enabled. This is a sync API and we covered it in detail in Chapter 2, Introducing Amazon Textract. We have created a function that takes any document image as input and returns a Textract response. We will talk about how this function can be automated using a lambda function in the last section of this chapter, Understanding how to create a serverless pipeline for medical claims. Run the following notebook cell to execute this function:
def calltextract(documentName):
client = boto3.client(service_name='textract',
region_name= 'us-east-1',
endpoint_url='https://textract.us-east-1.amazonaws.com')
with open(documentName, 'rb') as file:
img_test = file.read()
bytes_test = bytearray(img_test)
print('Image loaded', documentName)
response = client.analyze_document(Document={'Bytes': bytes_test}, FeatureTypes=['FORMS'])
return response
response= calltextract(documentName)
print(response)
If the response from Textract is a success, you will get the following message, along with a JSON response with extracted data:
The preceding response contains a lot of information, such as geometry, pages, and text in document metadata.
from trp import Document
def getformkeyvalue(response):
doc = Document(response)
key_map = {}
for page in doc.pages:
# Print fields
for field in page.form.fields:
if field is None or field.key is None or field.value is None:
continue
key_map[field.key.text] = field.value.text
return key_map
get_form_keys = getformkeyvalue(response)
print(get_form_keys)
You will get the following output:
All the form entries are extracted as key-value pairs.
def validate(body):
json_acceptable_string = body.replace("'", """)
json_data = json.loads(json_acceptable_string)
print(json_data)
zip = json_data['ZIP CODE']
id = json_data['ID NUMBER']
if(not zip.strip().isdigit()):
return False, id, "Zip code invalid"
length = len(id.strip())
if(length != 12):
return False, id, "Invalid claim Id"
return True, id, "Ok"
textract_json= json.dumps(get_form_keys,indent=2)
res, formid, result = validate(textract_json)
print(result)
print(formid)
As you can see, we get an Ok response, along with the valid claim ID.
Now, going back to the architecture, two things can happen:
In this section, we covered how you can use the Amazon Textract Analyze Document API to extract the form values from a medical intake form. We also covered how you can validate a Textract response.
Since the response is valid for the medical intake form, in the next section, we will show you how you can use Amazon Comprehend Medical to extract medical insights.
In this section, we will talk about how you can use Amazon Comprehend Medical to gain insights from a valid medical intake form. We covered Amazon Comprehend's features in Chapter 3, Introducing Amazon Comprehend. In this chapter, we will learn how to use the Amazon Comprehend Medical Entity API to extract entities such as patient diagnosis and PHI data types such as claim ID from the medical intake form. Let's get started:
comprehend = boto3.client(service_name='comprehendmedical')
cm_json_data = comprehend.detect_entities_v2(Text=textract_json)
print(" Medical Entities ========")
for entity in cm_json_data["Entities"]:
print("- {}".format(entity["Text"]))
print (" Type: {}".format(entity["Type"]))
print (" Category: {}".format(entity["Category"]))
if(entity["Traits"]):
print(" Traits:")
for trait in entity["Traits"]:
print (" - {}".format(trait["Name"]))
print(" ")
You will get the following medical insights by using this API:
It was able to determine the phone number and medical ID as PHI. If you have regulatory requirements, you can mask or redact these entity types easily as they have been correctly identified by this API.
def printtocsv(cm_json_data,formid):
entities = cm_json_data['Entities']
with open(TEMP_FILE, 'w') as csvfile: # 'w' will truncate the file
filewriter = csv.writer(csvfile, delimiter=',',
quotechar='|', quoting=csv.QUOTE_MINIMAL)
filewriter.writerow([ 'ID','Category', 'Type', 'Text'])
for entity in entities:
filewriter.writerow([formid, entity['Category'], entity['Type'], entity['Text']])
filename = "procedureresult/" + formid + ".csv"
S3Uploader.upload(TEMP_FILE, 's3://{}/{}'.format(bucket, prefix))
printtocsv(cm_json_data,formid)
You will get the following response:
"successfully parsed:procedureresult/a-184054-6661.csv"
In this section, we covered how you can extract medical insights or entities using the Amazon Comprehend Medical API for valid claims from Amazon Textract. In the next section, we will take an invalid claim form and extract data using Amazon Textract and postprocess this data to check for validation. If it's not valid, we will show you how you can set up SNs to notify the stakeholder via email.
In this section, we will walk through the architecture specified in Figure 12.1 when the claim is identified as invalid by Textract postprocessing. We will send the message to the stakeholder via Amazon SNS. Let's go back to the notebook:
InvalidDocument = "invalidmedicalform.png"
display(Image(filename=InvalidDocument))
You will get the following sample medical form, which we will check for invalid use cases:
In this form, ZIP CODE and ID NUMBER have been entered incorrectly.
response = calltextract(InvalidDocument)
get_form_keys = getformkeyvalue(response)
print(get_form_keys)
You will get the following output:
Here, we get all the key-value pairs or form data.
textract_json= json.dumps(get_form_keys,indent=2)
res, formid, result = validate(textract_json)
print(result)
print(formid)
print(res)
You will get the following response:
The valid method returns an invalid claim and false, along with the invalid claim ID.
Make sure you choose the Standard topic type.
Scroll down and click on Create Subscription.
Note:
It's important to confirm the subscription; otherwise, you will not be notified.
sns = boto3.client('sns')
topicARN="<Enter your topic arn>"
snsbody = "Content:" + str(textract_json) + "Reason:" + str(result)
print(snsbody)
try:
response = sns.publish(
TargetArn = topicARN,
Message= snsbody
)
print(response)
except Exception as e:
print("Failed while doing validation")
print(e.message)
You will get the following response:
You can always opt out of the topic you have created.
In this section, we covered how to process a medical claim using Amazon Textract, check for invalid medical claims, and notify the stakeholders about this. Next, we'll learn how to create a serverless pipeline for medical claims.
In the previous sections, we covered the building blocks of the architecture by using the Amazon Textract Sync API, the Amazon Comprehend Medical Detect Entities Sync API, and Amazon SNS to send invalid claims. We defined functions for this workflow and called the text extraction and validation functions to showcase the use case or workflow with both a valid and invalid medical claim form. These functions can be moved into lambda code and, along with S3 event notifications, can be invoked to create a scalable pipeline for medical claims processing. We can do this by using the following architecture:
We walked through a Jupyter notebook showing individual code components for processing medical claims using a single intake form. We created Python functions to extract data, validate data, gather insights, and convert those insights into CSV files. To process millions of such documents, we will learn how to make the code functions AWS Lambda functions to create an end-to-end automated serverless architecture using the preceding diagram. The medical claims form we'll be using has been dropped into an Amazon S3 bucket by payers of AwakenLife Ltd:
We can use scale and process medical claims documents at a large scale with just a few lines of code. This architecture can be quickly automated and deployed in the form of a CloudFormation template, which lets you set up Infrastructure as Code (IaC). We have provided a similar scalable implementation in the form of a blog in the Further reading section if you're interested.
In this section, we covered how to use the code we defined in the previous sections and move that to an AWS Lambda section to architect an end-to-end automated workflow using a walkthrough architecture. Now, let's summarize this chapter.
In this chapter, we introduced the medical claim processing use case. We then covered how you can use AWS AI services such as Amazon Textract to extract form data from these scanned medical forms. Then, we spoke about how you can perform some postprocessing on the extracted text based on your business rules to validate their form data. Once the form data had been validated, we showed you how to use Amazon Comprehend Medical, as covered in Chapter 3, Introducing Amazon Comprehend, to extract medical insights. Once you have medical insights, this data can be converted into a CSV file and saved in Amazon S3. Once you've done this, you can analyze this data for population health analytics by using Amazon Athena or Amazon QuickSight. We also discussed how to handle invalid claims processing by showing how to quickly configure Amazon SNS through the AWS console and add subscribers. You can notify your subscribers by email regarding the medical claims that have been submitted as invalid.
Lastly, we showed you how to architect a serverless scalable solution using AWS Lambda to call these Textract Sync and Amazon Comprehend Medical Sync APIs. This ensures that you have an end-to-end working automated architecture with the claim documents you uploaded in Amazon S3.
In the next chapter, we will cover how to improve the accuracy of your existing document processing workflows using Amazon Augmented AI with the human-in-the-loop process. We will also deep dive into aspects of why you need a human-in-the-loop process and how it helps improve the accuracy of your existing AI predictions.
To learn more about the topics that were covered in this chapter, take a look at the following resource: