You need a regular expression that matches each line in the log files produced by a web server that uses the Combined Log Format.[14] For example:
127.0.0.1 - jg
[27/Apr/2012:11:27:36 +0700] "GET /regexcookbook.html HTTP/1.1" 200 2326
"http://www.regexcookbook.com/" "Mozilla/5.0 (compatible; MSIE 9.0;
Windows NT 6.1; Trident/5.0)"
^(?<client>S+)●S+●(?<userid>S+)●[(?<datetime>[^]]+)]↵ ●"(?<method>[A-Z]+)●(?<request>[^●"]+)?●HTTP/[0-9.]+"↵ ●(?<status>[0-9]{3})●(?<size>[0-9]+|-)●"(?<referrer>[^"]*)"↵ ●"(?<useragent>[^"]*)"
Regex options: ^ and $ match at line breaks |
Regex flavors: .NET, Java 7, XRegExp, PCRE 7, Perl 5.10, Ruby 1.9 |
^(?P<client>S+)●S+●(?P<userid>S+)●[(?P<datetime>[^]]+)]↵ ●"(?P<method>[A-Z]+)●(?P<request>[^●"]+)?●HTTP/[0-9.]+"↵ ●(?P<status>[0-9]{3})●(?P<size>[0-9]+|-)●"(?P<referrer>[^"]*)"↵ ●"(?P<useragent>[^"]*)"
Regex options: ^ and $ match at line breaks |
Regex flavors: PCRE 4, Perl 5.10, Python |
^(S+)●S+●(S+)●[([^]]+)]●"([A-Z]+)●([^●"]+)?●HTTP/[0-9.]+"↵ ●([0-9]{3})●([0-9]+|-)●"([^"]*)"●"([^"]*)"●"([^"]*)"●"([^"]*)"
Regex options: ^ and $ match at line breaks |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
The Combined Log Format is the same as the Common Log Format, but
with two extra fields added at the end of each entry, and the first
extra field is the referring URL. The second extra field is the user
agent. Both appear as double-quoted strings. We can easily match those
strings with ‹"[^"]*"
›. We
put a capturing group around the ‹[^"]*
› so that we can easily retrieve the referrer
or user agent without the enclosing quotes.