If you look through the addresses output from the previous step, you may notice that not all of them have a street address as outlined earlier.
A common deviation from the street address pattern is the addition of an N or S to a street name. Another deviation is initial street names that contain more than one word:
4022 N Mozart St Irving Park
260-300 Osceola Ave S St Paul, MN 55102, USA
1656 Mount Eagle Place Alexandria, Virginia
Yet another deviation is the omission of street numbers:
Crown St New Haven, CT, USA
Depending on the project, you will usually need to decide how far to go to capture all of the variations in the data. The more complex the pattern, the more work it will take to capture.
Due to this trade-off, it is helpful to quantify how much of the data is captured by a particular pattern. In the next few subsections, I will walk through the process of creating a pattern string to capture the street address.