Joining a shapefile attribute table to a CSV file

Joining attribute tables to other database tables allows you to use a spatial dataset in order to reference a dataset without any geometry, using a common key between the data tables. A very common use case for this is to join a vector dataset of census attributes to a more detailed census attribute dataset. The use case we will demonstrate here links a US census track file to a detailed CSV file that contains more in-depth information.

Getting ready

For this recipe, you will need a census tract shapefile and a CSV file containing the appropriate census data for the shapefile. You can download the sample data set from https://geospatialpython.googlecode.com/svn/census.zip.

Extract this data to a directory named /qgis_data/census.

How to do it...

The join operation is quite involved. We'll perform this operation and save the layer as a new shapefile with the joined attributes. Then we'll load the new layer and compare the field count to the original layer to ensure that the join occurred. We'll use the terms target layer and join layer. The target layer will be the shapefile, and the join layer will be a CSV with some additional fields we want to add to the shapefile. To do this, perform the following steps:

  1. Start QGIS.
  2. From the Plugins menu, select Python Console.
  3. First, load the county's census track layer and validate it:
    vectorLyr =  QgsVectorLayer('/qgis_data/census/hancock_tracts.shp', 'Hancock' , "ogr")
    vectorLyr.isValid()
    
  4. Now, load the CSV file as a layer and validate it as well:
    infoLyr = QgsVectorLayer('/qgis_data/census/ACS_12_5YR_S1901_with_ann.csv', 'Census' , "ogr")
    infoLyr.isValid()
    
  5. Once this is done, you must add both the layers to the map registry for the two layers to interact for the join. However, set the visibility to False, so the layers do not appear on the map:
    QgsMapLayerRegistry.instance().addMapLayers([vectorLyr,infoLyr], False)
    
  6. Next, you must create a special join object:
    info = QgsVectorJoinInfo()
    
  7. The join object needs the layer ID of the CSV file:
    info.joinLayerId = infoLyr.id()
    
  8. Next, specify the key field from the CSV file whose values correspond to the values in the shapefile:
    info.joinFieldName = "GEOid2"
    
  9. Then, specify the corresponding field in the shapefile:
    info.targetFieldName = "GEOID"
    
  10. Set the memoryCache property to True in order to speed up access to the joined data:
    info.memoryCache = True
    
  11. Add the join to the layer now:
    vectorLyr.addJoin(info)
    
  12. Next, write out the joined shapefile to a new file on disk:
    QgsVectorFileWriter.writeAsVectorFormat(vectorLyr, "/qgis_data/census/joined.shp", "CP120", None, "ESRI Shapefile")
    
  13. Now, load the new shapefile back in as a layer for verification:
    joinedLyr =  QgsVectorLayer('/qgis_data/census/joined.shp', 'Joined' , "ogr")
    
  14. Verify that the field count in the original layer is 12:
    vectorLyr.dataProvider().fields().count()
    
  15. Finally, verify that the new layer has a field count of 142 from the join:
    joinedLyr.dataProvider().fields().count()
    

How it works...

This recipe reaches out to the very edge of the PyQGIS API, forcing you to use some workarounds. Most recipes for data manipulation can be performed programmatically without writing data to disk or loading layers onto the map, but joins are different. Because the QgsVectorJoinInfo object needs the layer ID of the CSV layer, we must add both the layers to the map layer registry. Fortunately, we can do this without making them visible, if we are just trying to write a data manipulation script. A join is designed to be a temporary operation to query a dataset. Oddly, PyQGIS lets you create the join, but you cannot query it. This limitation is the reason why if you want to work with the joined data, you must write it to a new shapefile and reload it. Fortunately, PyQGIS allows you to do that.

There's more...

You can find an alternate method that works around the PyQGIS limitation in a Processing Toolbox script, which manually matches the joined data in Python, at https://github.com/rldhont/Quantum-GIS/blob/master/python/plugins/processing/algs/qgis/JoinAttributes.py.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset