Use the
Split
instance method of the
string
class. For example:
string equation = "1 + 2 - 4 * 5"; string[] equationTokens = equation.Split(new char[1]{' '}); foreach (string Tok in equationTokens) Console.WriteLine(Tok);
This code produces the following output:
1 + 2 - 4 * 5
The Split
method may also be used to separate
people’s first, middle, and last names. For example:
string fullName1 = "John Doe"; string fullName2 = "Doe,John"; string fullName3 = "John Q. Doe"; string[] nameTokens1 = fullName1.Split(new char[3]{' ', ',', '.'}); string[] nameTokens2 = fullName2.Split(new char[3]{' ', ',', '.'}); string[] nameTokens3 = fullName3.Split(new char[3]{' ', ',', '.'}); foreach (string tok in nameTokens1) { Console.WriteLine(tok); } Console.WriteLine(""); foreach (string tok in nameTokens2) { Console.WriteLine(tok); } Console.WriteLine(""); foreach (string tok in nameTokens3) { Console.WriteLine(tok); }
This code produces the following output:
John Doe Doe John John Q Doe
Notice that a blank is inserted between the '.
'
and the space delimiters of the fullName3
name;
this is correct behavior. If you did not want to process this space
in your code, you can choose to ignore it.
If you have a consistent string whose parts, or
tokens, are separated by well-defined
characters, the Split
function can tokenize the
string. Tokenizing a string consists of breaking the string down into
well-defined, discrete parts, each of which is considered a token. In
the two previous examples, the tokens were either parts of a
mathematical equation (numbers and operators) or parts of a name
(first, middle, and last).
There are several drawbacks to this approach. First, if the string of
tokens is not separated by any well-defined character(s), it will be
impossible to use the
Split
method to
break up the string. For example, if the equation
string looked like this:
string equation = "1+2-4*5";
we would clearly have to use a more robust method of tokenizing this string (see Recipe 8.7 for a more robust tokenizer).
A second drawback is that a string of tokenized words must be entered consistently in order to gain meaning from the tokens. For example, if we ask users to type in their names, they may enter any of the following:
John Doe Doe John John Q Doe
If one user enters in his name the first way and another user enters
it the second way, our code will have a difficult time determining
whether the first token in the string array represents the first or
last name. The same problem will exist for all of the other tokens in
the array. However, if all users enter their names in a consistent
style, such as First
Name
, space,
Last
Name
, we
will have a much easier time tokenizing the name and understanding
what each token represents.
See Recipe 8.7; see the “String.Split Method” topic in the MSDN documentation.