How it works...

In this recipe, we discussed handling missing data. We need the mice, randomForest, and VIM packages. We started by creating a DataFrame with two columns x and y with values 1 to 100. We can also generate random 100 numbers too, but for simplicity we generated a sequence. Using a sample we replaced 10 values in x and 20 values in y with NA [missing values]. The VIM package is used for Visualization and Imputation of Missing Values. Using the aggr function we can visualize missing values. We use matrixplot to see the missing values, where red shows missing values and darker shades are for higher values. Using the mice package, we find the missing values using md.pattern. To impute the missing values, we use the mice function with random forest as method. Finally, we complete the imputation using the complete function of the mice package.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset