Named groups

Remember from the previous chapter when we got a group through an index?

>>>pattern = re.compile(r"(w+) (w+)")
>>>match = pattern.search("Hello⇢world")
>>>match.group(1)
  'Hello'
>>>match.group(2)
  'world'

We just learnt how to access the groups using indexes to extract information and to use it as backreferences. Using numbers to refer to groups can be tedious and confusing, and the worst thing is that it doesn't allow you to give meaning or context to the group. That's why we have named groups.

Imagine a regex in which you have several backreferences, let's say 10, and you find out that the third one is invalid, so you remove it from the regex. That means you have to change the index for every backreference starting from that one onwards. In order to solve this problem, in 1997, Guido Van Rossum designed named groups for Python 1.5. This feature was offered to Perl for cross-pollination.

Nowadays, it can be found in almost any flavor. Basically it allows us to give names to the groups, so we can refer to them by their names in any operation where groups are involved.

In order to use it, we have to use the syntax,(?P<name>pattern), where the P comes from Python-specific extensions (as you can read in the e-mail Guido sent to Perl developers at http://markmail.org/message/oyezhwvefvotacc3)

Let's see how it works with the previous example in the following code snippet:

>>> pattern = re.compile(r"(?P<first>w+) (?P<second>w+)")
>>> match = re.search("Hello world")
>>>match.group("first")
  'Hello'
>>>match.group("second")
  'world'

So, backreferences are now much simpler to use and maintain as is evident in the following example:

>>>pattern = re.compile(r"(?P<country>d+)-(?P<id>w+)")
>>>pattern.sub(r"g<id>-g<country>", "1-a
20-baer
34-afcr")
'a-1
baer-20
afcr-34'

As we see in the previous example, in order to reference a group by the name in the sub operation, we have to use g<name>.

We can also use named groups inside the pattern itself, as seen in the following example:

>>>pattern = re.compile(r"(?P<word>w+) (?P=word)")
>>>match = pattern.search(r"hello hello world")
>>>match.groups()
('hello',)

This is simpler and more readable than using numbers.

Through these examples, we've used the following three different ways to refer to named groups:

Use

Syntax

Inside a pattern

(?P=name)

In the repl string of the sub operation

g<name>

In any of the operations of the MatchObject

match.group('name')

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset