You want to check whether a string looks like a valid path to a folder or file on the Microsoft Windows operating system.
A [a-z]:\ # Drive (?:[^\/:*?"<>| ]+\)* # Folder [^\/:*?"<>| ]* # File
Regex options: Free-spacing, case insensitive |
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby |
^[a-z]:\(?:[^\/:*?"<>| ]+\)*[^\/:*?"<>| ]*$
Regex options: Case insensitive |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python |
A (?:[a-z]:|\\[a-z0-9_.$●-]+\[a-z0-9_.$●-]+)\ # Drive (?:[^\/:*?"<>| ]+\)* # Folder [^\/:*?"<>| ]* # File
Regex options: Free-spacing, case insensitive |
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby |
^(?:[a-z]:|\\[a-z0-9_.$●-]+\[a-z0-9_.$●-]+)\(?:[^\/:*?"<>| ]+\)*↵ [^\/:*?"<>| ]*$
Regex options: Case insensitive |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python |
A (?:(?:[a-z]:|\\[a-z0-9_.$●-]+\[a-z0-9_.$●-]+)\| # Drive \?[^\/:*?"<>| ]+\?) # Relative path (?:[^\/:*?"<>| ]+\)* # Folder [^\/:*?"<>| ]* # File
Regex options: Free-spacing, case insensitive |
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby |
^(?:(?:[a-z]:|\\[a-z0-9_.$●-]+\[a-z0-9_.$●-]+)\|\?[^\/:*?"<>|↵ ]+\?)(?:[^\/:*?"<>| ]+\)*[^\/:*?"<>| ]*$
Regex options: Case insensitive |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python |
Matching a full path to a file or folder on a drive that
has a drive letter is very straightforward. The drive is indicated
with a single letter, followed by a colon and a backslash. We easily
match this with ‹[a-z]:\
›. The backslash is a metacharacter in
regular expressions, and so we need to escape it with another
backslash to match it literally.
Folder and filenames on Windows can contain all characters,
except these: /:*?"<>|
. Line
breaks aren’t allowed either. We can easily match a sequence of all
characters except these with the negated character class ‹[^\/:*?"<>|
]+
›. The
backslash is a metacharacter in character classes too, so we escape
it. ‹
› and
‹
› are
the two line break characters. See Recipe 2.3 to learn more about (negated)
character classes. The plus quantifier (Recipe 2.12) specifies we want one or more such
characters.
Folders are delimited with backslashes. We can match a sequence
of zero or more folders with ‹(?:[^\/:*?"<>|
]+\)*
›, which puts the
regex for the folder name and a literal backslash inside a
noncapturing group (Recipe 2.9) that is
repeated zero or more times with the asterisk (Recipe 2.12).
To match the filename, we use ‹[^\/:*?"<>|
]*
›. The asterisk makes
the filename optional, to allow paths that end with a backslash. If
you don’t want to allow paths that end with a backslash, change the
last ‹*
› in the regex
into a ‹+
›.
Paths to files on network drives that aren’t mapped to
drive letters can be accessed using Universal Naming Convention (UNC)
paths. UNC paths have the form \
.server
share
folder
file
We can easily adapt the regex for drive letter paths to support
UNC paths as well. All we have to do is to replace the ‹[a-z]:
› part that matches the
drive letter with something that matches a drive letter or server
name.
‹(?:[a-z]:|\\[a-z0-9_.$●-]+\[a-z0-9_.$●-]+)
› does that. The vertical bar is
the alternation operator (Recipe 2.8). It
gives the choice between a drive letter matched with ‹[a-z]:
› or a server and share
name matched with ‹\\[a-z0-9_.$●-]+\[a-z0-9_.$●-]+
›. The alternation operator has the
lowest precedence of all regex operators. To group the two
alternatives together, we use a noncapturing group. As Recipe 2.9 explains, the characters ‹(?:
› form
the somewhat complicated opening bracket of a noncapturing group. The
question mark does not have its usual meaning after a parenthesis.
The rest of the regular expression can remain the same. The name of the share in UNC paths will be matched by the part of the regex that matches folder names.
A relative path is one that begins with a folder name
(perhaps the special folder ..
to select the parent folder) or
consists of just a filename. To support relative paths, we add a third
alternative to the “drive” portion of our regex. This alternative
matches the start of a relative path rather than a drive letter or
server name.
‹\?[^\/:*?"<>|
]+\?
› matches the
start of the relative path. The path can begin with a backslash, but
it doesn’t have to. ‹\?
› matches the backslash if present, or
nothing otherwise. ‹[^\/:*?"<>|
]+
› matches a folder or
filename. If the relative path consists of just a filename, the final
‹\?
› won’t match
anything, and neither will the “folder” and “file” parts of the regex,
which are both optional. If the relative path specifies a folder, the
final ‹\?
› will match
the backslash that delimits the first folder in the relative path from
the rest of the path. The “folder” part then matches the remaining
folders in the path, if any, and the “file” part matches the
filename.
The regular expression for matching relative paths no longer neatly uses distinct parts of the regex to match distinct parts of the subject text. The regex part labeled “relative path” will actually match a folder or filename if the path is relative. If the relative path specifies one or more folders, the “relative path” part matches the first folder, and the “folder” and “file” paths match what’s left. If the relative path is just a filename, it will be matched by the “relative path” part, leaving nothing for the “folder” and “file” parts. Since we’re only interested in validating the path, this doesn’t matter. The comments in the regex are just labels to help us understand it.
If we wanted to extract parts of the path into capturing groups, we’d have to be more careful to match the drive, folder, and filename separately. The next recipe handles that problem.
Recipe 8.19 also validates a Windows path but adds capturing groups for the drive, folder, and file, allowing you to extract those separately.
Techniques used in the regular expressions in this recipe are discussed in Chapter 2. Recipe 2.1 explains which special characters need to be escaped. Recipe 2.2 explains how to match nonprinting characters. Recipe 2.3 explains character classes. Recipe 2.5 explains anchors. Recipe 2.8 explains alternation. Recipe 2.9 explains grouping. Recipe 2.12 explains repetition. Recipe 2.18 explains how to add comments.