At the start of the chapter, we looked at textual content analysis. Currently, Watson Analytics does not offer a lot of help with textual data. Let's see what you can do with Watson Analytics and textual data.
Within Watson Analytics Refine, you can click on Data Metrics to gain some knowledge of the textual date within your file. In our example (shown in the next screenshot), we see that Watson Analytics scores the Comments field as Low Quality and provides us a missing values percentage (56 percent):
Of course, the field is low quality because not every record in the file has comments (56 percent actually) —a legitimate situation.
One approach for using Watson Analytics on textual data is perhaps looking for correlations between certain words or phrases found within the data. For example, it might be interesting to see if the presence of the word leadership in the comments field has any effect on the GPA average for the university. We can start by formulating a question: How do the values of GPA Average compare by Comments?
When Watson Analytics visualizes the answer, we can click on applied filter in the lower-left corner of the page:
Then, we can use search to find any comments that contain the word we are interested in (leadership), select them, and set them as our filter:
Finally, Watson Analytics shows us our filtered visualization:
Comments used in our file with the word leadership are demonstrates leadership and outstanding leadership abilities and—according to Watson Analytics—it seems that these students do not have very exciting average GPAs.
The net result is that when it comes to textual data, you'll need to perhaps supplement your analysis with preprocessing outside of Watson Analytics.