Python provides us with some forms of groups that can help us to modify the regular expressions or even to match a pattern only when a previous group exists in the match, such as an if
statement.
There is a way to apply the flags we've seen in Chapter 2 Regular Expressions with Python, using a special form of grouping: (?iLmsux)
.
Letter |
Flag |
---|---|
i |
re.IGNORECASE |
L |
re.LOCALE |
m |
re.MULTILINE |
s |
re.DOTALL |
u |
re.UNICODE |
x |
re.VERBOSE |
For example:
>>>re.findall(r"(?u)w+" ,ur"ñ") [u'xf1']
The above example is the same as:
>>>re.findall(r"w+" ,ur"ñ", re.U) [u'xf1']
We've seen what these examples do several times in the previous chapter.
Remember that a flag is applied to the whole expression.
This is a very useful case of groups. It tries to match a pattern in case a previous one was found. On the other hand, it doesn't try to match a pattern in case a previous group was not found. In short, it's like an if-else statement. The syntax for this operation is as follows:
(?(id/name)yes-pattern|no-pattern)
This expression means: if the group with this ID has already been matched, then at this point of the string, the yes-pattern
pattern has to match. If the group hasn't been matched, then the no-pattern
pattern has to match.
Let's see how it works continuing with our trite example. We have a list of products, but in this case the ID can be made in two different ways:
34-adrl-01
.adrl
.So, when there is a country code, we need to match the country area:
>>>pattern = re.compile(r"(dd-)?(w{3,4})(?(1)(-dd))") >>>pattern.match("34-erte-22") <_sre.SRE_Match at 0x10f68b7a0> >>>pattern.search("erte") <_sre.SRE_Match at 0x10f68b828>
As you can see in the previous example, there is a match when we have a country code and area code. Note that when there is a country code but no area code, there is no match:
>>>pattern.match("34-erte") None
And what's no-pattern
for? Let's add another constraint to the previous example: if there is no country code there has to be a name at the end of the string:
34-adrl-01
adrl-sala
.Let's see it in action:
>>>pattern = re.compile(r"(dd-)?(w{3,4})-(?(1)(dd)|[a-z]{3,4})$") >>>pattern.match("34-erte-22") <_sre.SRE_Match at 0x10f6ee750>
As expected, if there is a country code and an area code, there is a match.
>>>pattern.match("34-erte") None
In the preceding example, we do have a country area, but there is no area code, so there is no match.
>>>pattern.match("erte-abcd") <_sre.SRE_Match at 0x10f6ee880>
And finally, when there is no country area, there must be a name, so we have a match.
Note that no-pattern
is optional, so in the first example, we've omitted it.