In this section, we are going to learn about regular expressions in Python. Regular expression is a specialized programming language, which is embedded in Python and is available to users through the re module. We can define the rules for the set of strings that we want to match. Using regular expressions, we can extract specific information from files, code, documents, spreadsheets, and so on.
In Python, a regular expression is denoted as re and can be imported through the re module. Regular expressions support four things:
- Identifiers
- Modifiers
- Whitespace characters
- Flags
The following table lists the identifiers, and there's a description for each one:
Identifier |
Description |
w |
Matches alphanumeric characters, including underscore (_) |
W |
Matches non-alphanumeric characters, excluding underscore (_) |
d |
Matches a digit |
D |
Matches a non-digit |
s |
Matches a space |
S |
Matches anything but a space |
. |
Matches a period (.) |
|
Matches any character except a new line |
The following table lists the modifiers, and there's a description for each one:
Modifier |
Description |
^ |
Matches start of the string |
$ |
Matches end of the string |
? |
Matches 0 or 1 |
* |
Matches 0 or more |
+ |
Matches 1 or more |
| |
Matches either or x/y |
[ ] |
Matches range |
{x} |
Amount of preceding code |
The following table lists the whitespace characters, and there's a description for each one:
Character |
Description |
s |
Space |
|
Tab |
|
New line |
e |
Escape |
f |
Form feed |
|
Return |
The following table lists the flags, and there's a description for each one:
Flag |
Description |
re.IGNORECASE |
Case-insensitive matching |
re.DOTALL |
Matches any character including new lines |
re.MULTILINE |
Multiline matching |
Re.ASCII |
Makes escape match only on ASCII characters |
Now we are going to see some examples of regular expressions. We are going to learn about the match(), search(), findall(), and sub() functions.
Now we are going to learn about these functions one by one in the following sections.