Verifying the correctness of the matches

There is no way of objectively quantifying the extent to which the matches are correct without having a better pattern matching tool to begin with. It is a good idea, however, to get a qualitative sense of whether the regular expression is working properly. This can be done by printing out the identified street addresses. Because the print() function can slow down the program when working with files, it is good practice to limit the number of print() function calls. In the following continuation of extract_street_names.py, for addresses that produce a match, the street address is extracted from the address string. Using the match_count variable to limit the print() function calls, the first 200 street addresses are printed to the output.

....
street_address_match = street_address_regex.search(address.lower()):
if street_address_match:
match_count+=1

street_address = street_address_match.group()
## print out the matched items
## as a sanity check to make sure
## the regular expression is correct
if match_count<200:
print(street_address)
....

If all goes well, you should see a series of street addresses consisting strictly of a street number, an initial street name, and a street suffix. Here is the output on my machine:

From the result the regular expression appears to be effective at identifying street addresses in address strings. The last step is identify and extract the street name from the string containing the street address. This is done in the next section.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset