Why data inspection matters

When you're preparing your data, it's good to know the data you're dealing with. For example, listing statistics about your dataset that show you how many elements there are, and if there are any missing values. It's common that data has to be cleaned up before doing the analysis. Because the GeoPandas data objects are subclasses of pandas data objects, you can use their methods to do data inspection and cleaning. Take, for instance, the wildfire data shapefile we used earlier. By listing our dataframe object, it not only prints all of the attribute data, but also lists the total rows and columns, which is 20340 rows and 30 columns. The total amount of rows can also be printed this way:

In:        len(fires.index)

Out: 20340

This means there are 20340 individual wildfire cases in our input dataset. Now, compare this row value to the sum of the counts per state, after we've performed the spatial join:

In:        counts_per_state.sum()

Out: 20266

We notice that there are 74 less wildfires in our dataset after our spatial join. While at this point it's not clear what went wrong with our spatial join and why there are missing values, it's possible and recommended to check datasets before and after performing geometric operations, for example, checking for empty fields, non-values, or simply null-values:

In:        fires.empty   #checks if there are empty fields in the                             
dataframe

Out: False

The same operation can also be done by specifying a column name:

In:        fires['geometry'].empty

Out: False

Be aware of the fact that GeoPandas geometry columns use a combination of text and values, so checking for NaN or zero values doesn't make any sense.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset