In the previous chapters, we talked about and learned how to build Intelligent Document Processing (IDP) pipelines using Amazon Textract, Amazon Comprehend, and Amazon A2I. The advantage of setting up such pipelines is that you introduce automation into your operational processes and unlock insights that were previously not so evident. Speaking of insights, what are they exactly and why is everyone so interested in mining text, and of what use can they be?
To answer this, let's summon Doc Brown and Marty McFly's time-traveling car, the DeLorean from the movie Back to the Future, and travel back to Chapter 1, NLP in the Business Context and Introduction to AWS AI Services, to re-read the Understanding why NLP is becoming mainstream section. Remember now? Maybe this will help: according to Webster's dictionary (https://www.merriam-webster.com/), the word "insight" is defined as "the act or result of apprehending the inner nature of things or of seeing intuitively." You got it – it is all about uncovering useful information from seemingly vague or even mundane data. Simply put, it means to "see with clarity."
This chapter is all about how to visualize insights from text – that is, handwritten text – and make use of it to drive decision-making. According to Wikipedia, the earliest known handwritten script was Cuneiform (https://en.wikipedia.org/wiki/Cuneiform), which was prevalent almost 5,500 years ago. Equally old in spoken and written form is the native language of one of the authors, the Tamil language. That said, let's now head back to our favorite fictional organization, LiveRight Holdings, to solve a new challenge they seem to be having.
You have been given the task of running the Founder's Day for the firm, which is touted to be a spectacular gala, considering how popular LiveRight has become. To keep up with LiveRight's culture of benefiting the community, you will have to work with several local vendors to source what you need, such as furniture, food, and other items, for the event. You have been told that the management needs aggregated reports of all expenditure, so you decide to use your existing Document Processing pipeline to process their receipts. However, to your chagrin, you discover that the local vendors only provide handwritten receipts. You remember from a previous solution you built that Amazon Textract supports handwritten content, so you start thinking about how best to design for the situation.
In this chapter, we will cover the following topics:
For this chapter, you will need access to an AWS account, which you can do at https://aws.amazon.com/console/. Please refer to the Signing up for an AWS account sub-section within the Setting up your AWS environment section of Chapter 2, Introducing Amazon Textract, for detailed instructions on how you can signup for an AWS account and sign into the AWS Management Console.
The Python code and sample datasets for the solution discussed in this chapter can be found at https://github.com/PacktPublishing/Natural-Language-Processing-with-AWS-AI-Services/tree/main/Chapter%2017.
Check out the following video to see the Code in Action at https://bit.ly/3vLX5j0.
At this point, you are ready to start designing and building the approach. You realize that what will you build for this use case will become an extension of the existing Document Processing solution, so it will have long-term usage within the organization. So, you need to design for future scalability. With this in mind, you decide to use Amazon S3 (https://aws.amazon.com/s3/) for object storage, Amazon Textract (https://aws.amazon.com/textract/) for handwriting detection, and Amazon QuickSight (https://aws.amazon.com/quicksight/), a serverless ML-powered business intelligence service, for visualizing the insights from the handwritten content. We will be using an Amazon SageMaker Jupyter notebook for text extraction, followed by the AWS Management Console to set up the QuickSight visualizations. Let's get started.
If you have not done so in the previous chapters, you will have to create an Amazon SageMaker Jupyter notebook and set up Identity and Access Management (IAM) permissions for that Notebook Role to access the AWS services we will use in this notebook. After that, you will need to clone this book's GitHub repository (https://github.com/PacktPublishing/Natural-Language-Processing-with-AWS-AI-Services), create an Amazon S3 bucket (https://aws.amazon.com/s3/), and provide the bucket name in the notebook to start execution.
Note
Please ensure you have completed the tasks mentioned in the Technical requirements section.
Follow these steps to complete these tasks before we execute the cells from our notebook:
IAM Role Permissions While Creating Amazon SageMaker Jupyter Notebooks
Accept the default for the IAM Role at notebook creation time to allow access to an S3 bucket.
Next, we'll cover some additional IAM prerequisites.
We have to enable additional policies for our SageMaker notebook role. Please refer to the Changing IAM permissions and trust relationships for the Amazon SageMaker notebook execution role sub-section in the Setting up your AWS environment section of Chapter 2, Introducing Amazon Textract, for detailed instructions for executing the following steps:
{ "Version": "2012-10-17", "Statement": [ {
"Action": [
"iam:PassRole"
],
"Effect": "Allow",
"Resource": "<your sagemaker notebook execution role ARN">
}
]
}
Now that we have set up our Notebook and set up an IAM Role to run the walkthrough notebook, in the next section, we will create an Amazon S3 bucket.
Follow the instructions documented in the Creating an Amazon S3 bucket, a folder, and uploading objects sub-section in the Setting up your AWS environment section of Chapter 2, Introducing Amazon Textract, to create your Amazon S3 bucket. If you created an S3 bucket in the previous sections, please reuse that bucket. For this chapter, you just need to create the S3 bucket; we will create the folders and upload the necessary objects directly from the notebook. Let's get started:
bucket = "<enter-S3-bucket-name>"
Now that we have created the S3 bucket and imported the libraries we need, let's extract the contents using Amazon Textract.
We will now continue executing the rest of the cells in the notebook to update the QuickSight manifest file with our bucket and prefix entries. The manifest file provides metadata for the QuickSight dataset to correctly import the content for visualization. Please see the documentation (https://docs.aws.amazon.com/quicksight/latest/user/create-a-data-set-s3.html) for more details. Let's get started:
s3 = boto3.client('s3')
s3.upload_file(outfile,bucket,prefix+'/'+outfile)
Manifest file uploaded to: s3://<your-bucket-name>/chapter17/qsmani-formatted.json
!python -m pip install amazon-textract-response-parser
for docs in os.listdir('.'):
if docs.endswith('jpg'):
with open(docs, 'rb') as img:
img_test = img.read()
bytes_test = bytearray(img_test)
response = textract.analyze_document(Document={'Bytes': bytes_test}, FeatureTypes=['TABLES','FORMS'])
text = Document(response)
for page in text.pages:
for table in page.tables:
csvout = docs.replace('jpg','csv')
with open(csvout, 'w', newline='') as csvf:
tab = csv.writer(csvf, delimiter=',')
for r, row in enumerate(table.rows):
csvrow = []
for c, cell in enumerate(row.cells):
if cell.text:
csvrow.append(cell.text.replace('$','').rstrip())
tab.writerow(csvrow)
s3.upload_file(csvout,bucket,prefix+'/dashboard/'+csvout)
Extracted text from hw-receipt2.jpg
CSV file for document hw-receipt2.jpg uploaded to: s3://<s3-bucket-name>/chapter17/dashboard/hw-receipt2.csv
Extracted text from hw-receipt1.jpg
CSV file for document hw-receipt1.jpg uploaded to: s3://<s3-bucket-name>/chapter17/dashboard/hw-receipt1.csv
Note
You can also use Amazon A2I in this solution to set up a human loop to review the Textract outputs, as well as to make changes to the content as required, before creating the CSV files. For more details, please refer to Chapter 13, Improving the Accuracy of Document Processing Workflows, onward.
This concludes the steps from the notebook. Now, we will log into the AWS Management Console to set up QuickSight for visualization.
First, we need to enable QuickSight for your AWS account before we can import the data and run the visualizations. Please execute the following steps to proceed:
It is as simple as that. Feel free to try out the other visual types, as well as ML-powered forecasting and insights. For more details, please refer to the following documentation: https://docs.aws.amazon.com/quicksight/latest/user/making-data-driven-decisions-with-ml-in-quicksight.html. You can set up, share, publish, or export your dashboard for consumption by your management and other stakeholders. And that concludes the solution build for this use case.
We have just scratched the surface with what we can do with written text with this use case – the possibilities are truly endless! With just a few steps, by leveraging the advanced AI capabilities offered by services such as Amazon Textract, and the serverless scalable visualization offered by Amazon QuickSight, we were able to create powerful visuals from content scribbled on a piece of paper.
We began by creating the SageMaker Jupyter notebook instance we needed for this solution, cloned the GitHub repository for this chapter, created an S3 bucket, and executed the steps in the notebook to format the QuickSight S3 manifest file. Then, we used Amazon Textract and the Textract Response Parser library to read the contents of the handwritten receipts before creating CSV files that were uploaded to the S3 bucket. We concluded the notebook after executing these steps and then logged into the AWS Management Console and registered to use Amazon QuickSight.
In QuickSight, we imported the S3 dataset, which comprised our CSV files, and created two visuals and an insight. The first visual was a pie chart that showed the items that have been ordered against their quantities, while the second visual was a donut chart that showed the total cost of the two receipts, along with the cost per item. Finally, we displayed the insights that QuickSight had automatically generated, giving us a summary of what it was able to read from our content. We briefly discussed how we can export or share the dashboard and QuickSight's ML-based insights. And that concluded our solution build for this chapter.
Based on the myriad of use cases we have covered in this book so far, you know how to solve mainstream challenges in NLP for you and your customers, and we did all this without the need to tune a hyperparameter or train a model from the ground up. Granted, we trained a few custom Comprehend models, but that was without the overhead of a traditional ML workflow.
In the next chapter, we will conclude this book, so we thought we would leave you with some best practices, techniques, and guidelines to keep in your back pocket as you navigate your career as an NLP and AI expert. We will talk about document pre-processing, post-processing, and other items to consider during solution design. We are almost there!