Based on your knowledge of the data, you will be deciding to either remove the blanks or fill them in. In some situations, usually because of time, you'll just remove them. Those missing values may represent new business opportunities or additional insights. To choose the best approach, you should consider the following:
There are two approaches that you can take:
With IBM SPSS modeler, both approaches can pretty much be automated using what is called a node. Modeler nodes are classified as source, process, output, and modeling nodes depending on their function. Various nodes include the data audit node, the select node, the reclassify node, as well as source and output nodes.
With nodes, you can create logic that excludes records with fields that contain too many missing values, or assigns missing values for any or all of the fields. This is a powerful feature, allowing you to not only assess the quality of your data but also take action based on the assessment. In the next section, we will examine a sample use case that demonstrates the use of IBM SPSS modeler to extend Watson by automating the assessment and cleaning of a data file.