Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

8.13. Extracting the Query from a URL

Problem

You want to extract the query from a string that holds a URL. For example, you want to extract param=value from http://www.regexcookbook.com?param=value or from /index.html?param=value.

Solution

^[^?#]+?([^#]+)

Regex options: Case insensitive

Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Discussion

Extracting the query from a URL is trivial if you know that your subject text is a valid URL. The query is delimited from the part of the URL before it with a question mark. That is the first question mark allowed anywhere in URLs. Thus, we can easily skip ahead to the first question mark with ‹^[^?#]+?›. The question mark is a metacharacter only outside character classes, but not inside, so we escape the literal question mark outside the character class. The first ‹^› is an anchor (Recipe 2.5), whereas the second ‹^› negates the character class (Recipe 2.3).

Question marks can appear in URLs as part of the (optional) fragment after the query. So we do need to use ‹^[^?#]+?›, rather than just ‹?›, to make sure we have the first question mark in the URL, and make sure that it isn’t part of the fragment in a URL without a query.

The query runs until the start of the fragment, or the end of the URL if there is no fragment. The fragment is delimited from the rest of the URL with a hash sign. Since hash signs are not permitted anywhere except in the fragment, ‹[^#]+› is all we need to match the query. The negated character class matches everything up to the first hash sign, or everything until the end of the subject if it doesn’t contain any hash signs.

This regular expression will find a match only for URLs that actually contain a query. When it matches a URL, the match includes everything from the start of the URL, so we put the ‹[^#]+› part of the regex that matches the query inside a capturing group. When the regex finds a match, you can retrieve the text matched by the first (and only) capturing group to get the query without any delimiters or other URL parts. Recipe 2.9 tells you all about capturing groups. See Recipe 3.9 to learn how to retrieve text matched by capturing groups in your favorite programming language.

If you don’t already know that your subject text is a valid URL, you can use one of the regexes from Recipe 8.7. The first regex in that recipe captures the query, if one is present in the URL, into capturing group number 12.

Table of Contents for
8.13. Extracting the Query from a URL

8.13. Extracting the Query from a URL

Problem

Solution

Discussion

See Also

Table of Contents for 8.13. Extracting the Query from a URL

Create new playlist

Sign In

Sign Up

8.13. Extracting the Query from a URL

Problem

Solution

Discussion

See Also

Table of Contents for
8.13. Extracting the Query from a URL