Handling missing values

Based on your knowledge of the data, you will be deciding to either remove the blanks or fill them in. In some situations, usually because of time, you'll just remove them. Those missing values may represent new business opportunities or additional insights. To choose the best approach, you should consider the following:

  • How big is your data file?
  • What is the total number of fields that contain blanks?
  • What is the amount of missing information?

There are two approaches that you can take:

  • You can exclude the fields or records that contain missing values
  • You can impute, replace, or coerce the missing values

With IBM SPSS modeler, both approaches can pretty much be automated using what is called a node. Modeler nodes are classified as source, process, output, and modeling nodes depending on their function. Various nodes include the data audit node, the select node, the reclassify node, as well as source and output nodes.

Note

There are many more nodes provided in SPSS modeler; these are but a few.

With nodes, you can create logic that excludes records with fields that contain too many missing values, or assigns missing values for any or all of the fields. This is a powerful feature, allowing you to not only assess the quality of your data but also take action based on the assessment. In the next section, we will examine a sample use case that demonstrates the use of IBM SPSS modeler to extend Watson by automating the assessment and cleaning of a data file.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset