Now we have everything to process data and get the coordinates in bulk. In the Jupyter Notebook, this could be something as short as the following three lines, assuming we have the path_in and path_out variables predefined (of course, here we don't actually do anything with the errors):
path_in = './cities.csv'
path_out = './geocoded.csv'
data = read_csv(path_in)
result, errors = geocode_bulk(data, column='address', verbose=True)
write_csv(result, path_out)
It is not very convenient, however, to fire up Jupyter and run through all the cells every time just to load the functions we write. Instead, we can store our functions in a separate module—a text file with the .py extension—and import the functions from there.
Let's create a new text file using Visual Studio Code (which is what we recommend). Here is what you should do:
- Create a new file and call it geocode.py in the same folder as that for the notebooks we run.
- Once the file is open, copy and paste all the functions we created so far in the file. Visual Studio Code will highlight all possible mistypes and list code issues in the PROBLEMS section for you.
- Once the file is ready, we can return back to Jupyter and import the code from this file (no need to use the extension) just as if it was a library:
from geocode import nominatim_geocode
result = nominatim_geocode('Eiffel Tower')
Of course, you can also import specific functions—or even variables, if you want. This ability to use Python files as modules is very useful. All the generic code that can be used for a broad range of applications, and all the code that is too long and not as relevant for the notebooks, should be moved into Python files and imported. This will improve notebook readability and helps you reuse existing code in other projects.
Now that we know how to move the code, let's see how to collect the data in the next section.