IN THIS CHAPTER
• Summary
This chapter explains how PowerShell can be used to deal with strings, and we use examples to help illustrate this important concept.
We examine how strings are handled in the .NET Framework and we look at the [string]
type accelerator.
We then move on to the Select-String
cmdlet. A brief overview is given of basic operations that can be done with strings.
We go back to the .NET Framework features and look at the members of a string object, and then we discuss wildcards, regular expressions, and the kinds of operators that can be found in PowerShell.
Finally, we talk about two new operators that are in the 2.0 CTP and the format operator.
A string is typically a series of alphabetical characters, but it can also include numbers and non-alphanumeric characters, such as a space or a tab. A simple string that most people encounter when they are new to a scripting or programming language is “Hello World!” Because PowerShell is based on the .NET Framework, even a simple string is actually still an object.
PowerShell does a lot of things automatically when dealing with strings. Let’s show the “long” way to create a string from within PowerShell by calling the appropriate .NET Framework classes.
We use the New-Object
cmdlet with the System.String
class to define our object. We pass the argument “testing” to the class. Technically, this isn’t an argument, but a constructor.
If we were using a programming language such as C#, declaring a string wouldn’t be so easy.
It’s easier with PowerShell because we can let PowerShell do all the work for us. With PowerShell, the following code is exactly the same as the previous code.
Just to prove both are actually string objects, we could have used Get-Member
to list out the class and members of the object, but we are going to use a little shortcut to get to the object type quicker.
The previous is just another way for us to get the type of object with which we are dealing.
We showed how easy it is to create a string object in PowerShell. However, there are some occasions when we want to create a string, but we might end up with an integer or an array.
Something similar can be done if we need to force an integer to be a string:
PowerShell provides the Select-String
cmdlet to work with strings. One big bonus for this cmdlet is that it supports wildcards, which we look at in more detail later. This cmdlet comes with many more features in the 2.0 CTP, and we’ll look at a few of these new features.
Let’s provide some examples of how we can pipe a list of elements to the cmdlet.
The previous example uses a wildcard character to make sure we match on the exact string “test
” only, in contrast to the example just before it where searching for “test
” provided two matches.
We use the NotMatch
parameter to get any entries where the string does not exist. We see the first example of using NotMatch
returns nothing, because our string “test
” matched all the entries passed via the pipeline. In the second example, however, we see we match on “foo.
”
The NotMatch
parameter used above is available only with the 2.0 CTP. The CTP also comes with a AllMatches
parameter, which we look at next.
Let’s provide another example of a new feature included with the CTP: the AllMatches
parameter. This cmdlet returns a MatchInfo
object with a special Matches
property. Using this Matches
property provides us with additional information on the matches found.
In PowerShell 1.0, the Select-String
cmdlet will only return the first match in a string. In contrast, with 2.0, Select-String
provides a new AllMatches
parameter that can return all the matches found within a string passed. This is helpful when the string contains multiple occurrences of the substring or pattern we are searching for.
We see in the previous example that when we use the AllMatches
parameter, we have a Matches
property from the resulting object that lists all the matching strings contained within the original input string.
We can do a few simple things when dealing with strings: We can add them together and multiply them by an integer. Although we won’t provide examples here, you can also combine these operators together when working with strings.
Let’s provide a simple example of multiplying a string. Here’s an example that might help you pick your lottery numbers the next time.
We use Get-Random
, a new cmdlet from the 2.0 CTP, to provide a graphical representation of the number of times that our numbers from 1 to 10 have been generated randomly.
Remembering that we are dealing with .NET objects, and our simple string has a lot of intrinsic properties and methods provided by the .NET Framework. Let’s look at each of the supported members and provide a simple example of each of the more commonly used ones.
In the previous example, we pipe a string to the get-member
cmdlet. Using a feature named .NET Reflection, we can get a listing of all the members of the object passed along the pipeline (in this case, the members are properties and methods). There are also other types of members like events, but these are not supported in PowerShell version 1.
Let’s look at the more commonly used methods and properties provided by a string object.
The Contains
method provides us a way to determine whether a particular string is contained within the original string.
Note that in the previous example, we could have also used variables to determine whether a string contains another string. We are going to provide this example once here, but these examples of using variables apply to all the upcoming examples where we demonstrate other supported methods.
The EndsWith
method provides us with a way to determine whether a string ends with a particular string.
The Insert
method provides us with a way to insert a string into another string at a certain point in the string. In the following example, we insert the string “foo
” starting after the fourth character of the original string.
Using zero as the integer in the first argument passed would result in the string of the second argument being prepended to the original string.
The Remove
method provides us with a way to remove a particular string from an original string based on a starting position and length.
The length of the string we want to remove is actually optional. If we include only one integer in the argument passed to the method, everything from that point, up to the end of the string, is removed.
The Replace
method provides us with some basic functionality to replace a particular string with a new string.
Later in this chapter, we see other methods that can be used to replace strings with a new string.
The Split
method provides us a way to split up a string into smaller pieces.
One useful feature when Split
is used is that the results can be referenced similarly to the way individual elements are referenced with an array.
The StartsWith
method provides us a way to determine whether a string begins with a particular string.
The SubString
method provides us a way to extract a particular string from the original string based on a starting position and length. If only one argument is provided, it is taken to be the starting position, and the remainder of the string is outputted.
Another useful feature of the SubString
method is when we want to get just a certain part of the ending of a string.
The ToLower
method provides us a way to convert an entire string to lowercase letters.
The ToString
method provides us a way to attempt to basically extract a string from the current object. When dealing with an actual string, this method is pretty useless, but we can provide an example of where this might actually be useful.
We use a string and a DateTime
object (what is returned by Get-Date
) to demonstrate when ToString
can be useful. PowerShell is smart, and if it sees we started a command with a string, it is trying to convert anything we attempt to add to a string also.
Now, when we reverse the order of things and put the DateTime
object first, PowerShell can’t figure out what to do and spits out an error. To get around this, we use the ToString
method for the DateTime
object, and then we can use the regular addition symbol to join the strings together.
The ToUpper
method provides us a way to convert an entire string to uppercase letters.
Often, when dealing with strings, we might need to match a certain part of the string to get the desired results. We have seen some tricks earlier in this chapter, but let’s introduce wildcards.
Three basic wildcards are supported by PowerShell: *
, ?
, []
. The following table outlines each of these possibilities and explains how the last wildcard ([]
) can be used in two different scenarios: matching a range or matching a specific set.
Table 7.1 provides several examples of how regular expressions can be used. Each of the different types of wildcards are listed with a description, example usage, and examples where the expression will or will not match against.
Wildcards are useful in everyday use. For example, we might want to find all the files with the string “file” in the name. We don’t just want to match on “file” alone, so here’s an example of some possibilities:
• We want to match on any extension: “file.*.
”
• We want to match on anything that may be before or after the string: "*file*.*.
”
• We want to match on anything that contains one character after the string: “*file?.*.
”
• We want to match on anything that contains only one number before the string: "[0-9]file.*.
”
We could go on with different possibilities.
Let’s provide another example on how wildcards be used as a simple method to locate commands. We are going to use wildcards to help us find the Get-EventLog
cmdlet.
In the previous box, we look at the *
and ?
wildcards. In the first command, we search for *eventlog
, which can be interpreted as us looking for “anything is allowed before we have the string eventlog
.” We see we get the result we want. Next, we look for *eventlog*
, which can be interpreted as looking for “anything is allowed before we have the string eventlog, and anything is allowed after.” Those are two examples where we use *.
Our third command, we combine *
and ?
as *eventlo?
, which can be interpreted as us looking for “anything before the string eventlo, then one single character.” In this case, ?
matches “g
.” Finally, we look for *eventlog?
. Now, this fails because we ended our search string with ?
, which means one more character must be there; otherwise, the wildcard doesn’t match anything.
Several cmdlets support wildcards like this as values being passed to their parameters. Let’s look at the first part of the full help details of the Get-Eventlog
cmdlet.
We show only the first parameter listed, but want to highlight the last line where we see “Accept wildcard characters?”. We see it says “False” after, which means this particular parameter doesn’t accept the use of wildcards. That being said, all the parameters supported by Get-EventLog
don’t support wildcards. Let’s use another cmdlet: Get-ChildItem
.
In the previous box, we have another example of using help to list out parameters. For the Get-ChildItem
cmdlet, we see that the path
parameter accepts a wildcard. We can easily determine this because we see “True” next to “Accept wildcard characters”. Let’s test it out.
In the previous box, we show a practical example of how we can use wildcards when passing a value to a parameter that supports them.
Wildcards can be useful when we need to do string comparisons to possibly determine if a string resembles (or not) another particular string. We can use wildcards to compare strings against more general strings where we just need a particular substring to match, but still have strict guidelines on the matches we will accept or not.
In Table 7.2, we provide examples outlining the most commonly used comparison operators when dealing with strings.
When dealing with these operators, the result is returned as a Boolean value (either true or false).
The use of the eq
and ne
operators is also commonly used when dealing with integers, but also equally applies to comparing strings. These two operators don’t support wildcards. Wildcards and regular expressions, which we cover in the next section, are supported with the Like
/NotLike
and Match
/NotMatch
operators respectively. Although the examples look more like wildcard comparisons, Match
/NotMatch
can work with more complicated regular expression comparisons.
Let’s look at the Like
/NotLike
operators first because we just provided an overview of wildcards in the previous section.
Here are some simple examples of using Like
/NotLike
with wildcards.
Remember, when using the *
wildcard, it matches anything. In the previous example, we try to determine if “foo
” is like “f?
”. The results returns false, because ?
takes the place of a single character. We can confirm this when, in our next example, we compare “foo
” with “f??,
” which returns true. It returns true because of the two ?
wildcards, which each match an “o.
”
The topic of regular expressions can usually consist of an entire chapter alone. Complex regular expressions have baffled many people and alluded even intermediate users at times. Nonetheless, it is important to have a basic grasp of the concepts.
We briefly talked about the Match
and NotMatch
comparison operators earlier in this chapter. Now that we are talking about regular expressions, we are going to talk about these operators and provide examples.
PowerShell supports a number of the regular expression characters. Table 7.3 provides several examples where regular expressions are used wih the Match/NotMatch
comparison operators.
PowerShell supports the quantifiers available in .NET regular expressions, and the following are some examples of quantifiers. We provides some examples of quantifiers in Table 7.4.
The use of the string “w” comes from the .NET Framework. Basically it represents any word character which includes all alpha-numeric characters.
When dealing with strings, we might need to replace certain parts of a string with another string. We saw earlier that string objects support a replace method. There is also a replace operator. The difference is that the replace operator supports regular expressions, so it is more powerful.
Table 7.5 provides some simple examples of using the replace operator.
Let’s provide a more complicated example using a real regular expression. Here, we use the string “file000.txt
” as an example. We want to replace a string where we find two consecutive decimals.
The .NET Framework provides a class named System.Text.RegularExpressions.Regex
that provides powerful capabilities when dealing with regular expressions. As with other .NET classes, we can call this directly from within PowerShell. To make this even better, PowerShell has a type accelerator or shortcut for this class: [RegEx]
.
Let’s look at an example of using this .NET class. We’re going to put together a long regular expression that should be able to match any valid email address.
We’re going to use [RegEx]
to define the type of our variable, and then use it to check a few strings.
Let’s pick the previous expression apart.
• The RegEx
starts with “^
” and ends with “$
”. These represent the beginning and the end of a string, respectively.
• In a few spots, we have the string “[0-9a-zA-Z]
”. This basically represents any alpha-numeric character matching all decimals from 0 to 9 and all alphabetical characters, including upper and lowercase letters. In other words, this will match 0, a, and A and any other valid alphabetical or numerical character.
• Near the middle, we have the “@
” symbol. This basically just represents itself. Every email address has the “@
” symbol after the username and before the domain name.
• We have this part at the beginning "^([0-9a-zA-Z]([-.w]*[0-9a-zA-Z])*
. This guarantees we have at least a single alpha-numeric character at the beginning of the string. This also guarantees that if we have a character such as “-
”, “.
”, or “_
”, for example, there must be an alpha-numeric character that follows before it can be a valid username.
• We have this part at the end ([0-9a-zA-Z][-w]*[0-9a-zA-Z].)+[a-zA-Z]{2,9})$"
. This guarantees we have at least have a format of something such as “[0-9a-zA-Z] [0-9a-zA-Z]. [a-zA-Z] [a-zA-Z]
”. Again, if we start off the string with an alpha-numeric character, and then have a character such as “-
”, “.
”, or “_
”, there must be one more alpha-numeric character following this. The “+
” indicates that what precedes in parentheses “([0-9a-zA-Z][-w]*[0-9a-zA-Z].)
” must occur one or more times to be valid.
As of the 2.0 CTP, two new operators were added for dealing with strings: Join
and Split
. Let’s look briefly at each and how these can be used to help us to work with string objects more efficiently.
The join
operator provides the ability to combine strings together into one string. Two different methods can be used with this operator, as shown here:
• -Join <String[]>
• <String[]> -Join <Delimiter>
The first method simply joins a bunch of strings together, whereas the second method enables us to define whether we want to use any kind of delimiter when we join our substrings together. Let’s provide an example of using each method.
In the first example, we join two strings together. The second example joins the two strings, but places a semicolon in the resulting string.
The Split
operator provides the ability to split a string into smaller ones. Three different methods can be used with this operator, as shown here:
• -Split <String>
• <String> -Split <Delimiter>[,<Max-substrings>[,"<Options>"]]
• <String> -Split {<ScriptBlock>} [,<Max-substrings>]
The first method can Split
a string into smaller pieces, the second method enables us to define whether we want to use any kind of delimiter when we split the strings into smaller parts, and the third method enables us to provide a more complicated scriptblock to determine how our string will be split up. Let’s provide an example of the first two methods. For more examples, please consult the built-in help.
You can substitute Split
with Csplit
for adding case sensitivity to the delimiter rules. Isplit
is also valid, but is also case-insentitive like Split
. However, this is not valid for our first example provided previously.
In the first example, simply split a string into smaller strings based on splitting the string at each blank space. The second example splits the string into smaller pieces based on using the comma as a delimiter.
The Format
operator (-f
) provides a lot of flexibility when it comes for formatting strings. This operator is based on the .NET Framework’s composite formatting (http://msdn.microsoft.com/en-us/library/txafckwd.aspx).
The PowerShell team blog offers a tip on how to determine if an object supports special formatting or not and also provides an example of using different types of formatting for a DateTime
object:
http://blogs.msdn.com/powershell/archive/2006/06/16/634575.aspx
In summary, we looked at how string objects can be created and manipulated within PowerShell. We also spent a bit of time trying to help the reader understand how the .NET Framework can be used within PowerShell.
We looked at wildcards, regular expressions, and the various operators that can be used. We also provided some brief examples on how operators can be useful when dealing with strings.
Near the end of the chapter, we introduce some new operators that are available with the 2.0 CTP.