The regular expressions in this recipe are very similar to the
ones in the previous recipe. This discussion assumes you’ve already read
and understood the discussion of the previous recipe.
We’ve made only one change to the regular expressions
for drive letter paths, compared to the ones in the previous recipe.
We’ve added three capturing groups that you can use to retrieve the
various parts of the path: ‹drive
›, ‹folder
›, and ‹file
›. You can use these names if your regex
flavor supports named capture (Recipe 2.11). If
not, you’ll have to reference the capturing groups by their numbers:
1, 2, and 3. See Recipe 3.9 to learn
how to get the text matched by named and/or numbered groups in your
favorite programming language.
Drive letter, UNC, and relative paths
Things get a bit more complicated if we also want to
allow relative paths. In the previous recipe, we could just add a
third alternative to the drive part of the regex to match the start of
the relative path. We can’t do that here. In case of a relative path,
the capturing group for the drive should remain empty.
Instead, the literal backslash that was after the capturing
group for the drives in the regex in the “drive letter and UNC paths”
section is now moved into that capturing group. We add it to the end
of the alternatives for the drive letter and the network share. We add
a third alternative with an optional backslash for relative paths that
may or may not begin with a backslash. Because the third alternative
is optional, the whole group for the drive is essentially
optional.
The resulting regular expression correctly matches all Windows
paths. The problem is that by making the drive part optional, we now
have a regex in which everything is optional. The folder and file
parts were already optional in the regexes that support absolute paths
only. In other words: our regular expression will match the empty
string.
If we want to make sure the regex doesn’t match empty strings,
we’d have to add additional alternatives to deal with relative paths
that specify a folder (in which case the filename is optional), and
relative paths that don’t specify a folder (in which case the filename
is mandatory):
A
(?:
(?<drive>[a-z]:|\\[a-z0-9_.$●-]+\[a-z0-9_.$●-]+)\
(?<folder>(?:[^\/:*?"<>|
]+\)*)
(?<file>[^\/:*?"<>|
]*)
| (?<relativefolder>\?(?:[^\/:*?"<>|
]+\)+)
(?<file2>[^\/:*?"<>|
]*)
| (?<relativefile>[^\/:*?"<>|
]+)
)
Regex options:
Free-spacing, case insensitive |
Regex flavors: .NET,
Java 7, PCRE 7, Perl 5.10, Ruby 1.9 |
A
(?:
(?P<drive>[a-z]:|\\[a-z0-9_.$●-]+\[a-z0-9_.$●-]+)\
(?P<folder>(?:[^\/:*?"<>|
]+\)*)
(?P<file>[^\/:*?"<>|
]*)
| (?P<relativefolder>\?(?:[^\/:*?"<>|
]+\)+)
(?P<file2>[^\/:*?"<>|
]*)
| (?P<relativefile>[^\/:*?"<>|
]+)
)
Regex options:
Free-spacing, case insensitive |
Regex flavors: PCRE 4
and later, Perl 5.10, Python |
A
(?:
([a-z]:|\\[a-z0-9_.$●-]+\[a-z0-9_.$●-]+)\
((?:[^\/:*?"<>|
]+\)*)
([^\/:*?"<>|
]*)
| (\?(?:[^\/:*?"<>|
]+\)+)
([^\/:*?"<>|
]*)
| ([^\/:*?"<>|
]+)
)
Regex options:
Free-spacing, case insensitive |
Regex flavors: .NET,
Java, PCRE, Perl, Python, Ruby |
^(?:([a-z]:|\\[a-z0-9_.$●-]+\[a-z0-9_.$●-]+)\↵
((?:[^\/:*?"<>|
]+\)*)([^\/:*?"<>|
]*)|(\?(?:[^\/:*?"<>|↵
]+\)+)([^\/:*?"<>|
]*)|([^\/:*?"<>|
]+))$
Regex options: Case
insensitive |
Regex flavors: .NET,
Java, JavaScript, PCRE, Perl, Python |
The price we pay for excluding zero-length strings is that we
now have six capturing groups to capture the three different parts of
the path. You’ll have to look at the scenario in which you want to use
these regular expressions to determine whether it’s easier to do an
extra check for empty strings before using the regex or to spend more
effort in dealing with multiple capturing groups after a match has
been found.
When using Perl 5.10, Ruby 1.9, or .NET, we can give multiple
named groups the same name. See the section Groups with the same name in Recipe 2.11 for details. This way we can simply get
the match of the folder or file group, without worrying about which of
the two folder groups or three file groups actually participated in
the regex match:
A
(?:
(?<drive>[a-z]:|\\[a-z0-9_.$●-]+\[a-z0-9_.$●-]+)\
(?<folder>(?:[^\/:*?"<>|
]+\)*)
(?<file>[^\/:*?"<>|
]*)
| (?<folder>\?(?:[^\/:*?"<>|
]+\)+)
(?<file>[^\/:*?"<>|
]*)
| (?<file>[^\/:*?"<>|
]+)
)
Regex options:
Free-spacing, case insensitive |
Regex flavors: .NET,
Perl 5.10, Ruby 1.9 |