Chapter 2. Strings and Characters

String usage abounds in just about all types of applications. The System.String type does not derive from System.ValueType and is therefore considered a reference type. The string alias is built into C# and can be used instead of the full name.

The FCL does not stop with just the string class; there is also a System.Text.StringBuilder class for performing string manipulations and the System.Text.RegularExpressions namespace for searching strings. This chapter will cover the string class and the System.Text.StringBuilder class.

The System.Text.StringBuilder class provides an easy, performance friendly, method of manipulating string objects. This class duplicates much of the functionality of a string class. However, this duplicated functionality provides a more efficient manipulation of strings than is obtainable by using the string class.

2.1. Determining the Kind of Character

Problem

You have a variable of type char and wish to determine the kind of character it contains—a letter, digit, number, punctuation character, control character, separator character, symbol, whitespace, or surrogate character. Similarly, you have a string variable and want to determine the kind of character in one or more positions within this string.

Solution

Use the built-in static methods on the System.Char structure shown here:

Char.IsControl
Char.IsDigit
Char.IsLetter
Char.IsNumber
Char.IsPunctuation
Char.IsSeparator
Char.IsSurrogate
Char.IsSymbol
Char.IsWhitespace

Discussion

The following examples demonstrate how to use the methods shown in the Solution section in a function to return the kind of a character. First, create an enumeration to define the various types of characters:

public enum CharKind
{
    Control,
    Digit,
    Letter,
    Number,
    Punctuation,        
    Separator,
    Surrogate,
    Symbol,
    Whitespace,
    Unknown
}

Next, create a method that contains the logic to determine the type of a character and to return a CharKind enumeration value indicating that type:

public static CharKind GetCharKind(char theChar)
{
    if (Char.IsControl(theChar))
    {
        return CharKind.Control;
    }
    else if (Char.IsDigit(theChar))
    {
        return CharKind.Digit;
    }
    else if (Char.IsLetter(theChar))
    {
        return CharKind.Letter;
    }
    else if (Char.IsNumber(theChar))
    {
        return CharKind.Number;
    }
    else if (Char.IsPunctuation(theChar))
    {
        return CharKind.Punctuation;
    }
    else if (Char.IsSeparator(theChar))
    {
        return CharKind.Separator;
    }
    else if (Char.IsSurrogate(theChar))
    {
        return CharKind.Surrogate;
    }
    else if (Char.IsSymbol(theChar))
    {
        return CharKind.Symbol;
    }
    else if (Char.IsWhiteSpace(theChar))
    {
        return CharKind.Whitespace;
    }
    else
    {
        return CharKind.Unknown;
    }
}

If, however, a character in a string needs to be evaluated, use the overloaded static methods on the Char structure. The following code modifies the GetCharKind method to accept a string variable and a character position in that string. The character position determines which character in the string is evaluated:

public static CharKind GetCharKindInString(string theString, int charPosition)
{
    if (Char.IsControl(theString, charPosition))
    {
        return CharKind.Control;
    }
    else if (Char.IsDigit(theString, charPosition))
    {
        return CharKind.Digit;
    }
    else if (Char.IsLetter(theString, charPosition))
    {
        return CharKind.Letter;
    }
    else if (Char.IsNumber(theString, charPosition))
    {
        return CharKind.Number;
    }
    else if (Char.IsPunctuation(theString, charPosition))
    {
        return CharKind.Punctuation;
    }
    else if (Char.IsSeparator(theString, charPosition))
    {
        return CharKind.Separator;
    }
    else if (Char.IsSurrogate(theString, charPosition))
    {
        return CharKind.Surrogate;
    }
    else if (Char.IsSymbol(theString, charPosition))
    {
        return CharKind.Symbol;
    }
    else if (Char.IsWhiteSpace(theString, charPosition))
    {
        return CharKind.Whitespace;
    }
    else
    {
        return CharKind.Unknown;
    }
}

The GetCharKind method accepts a character as a parameter and performs a series of tests on that character using the Char type’s built-in static methods. An enumeration of all the different types of characters is defined and is returned by the GetCharKind method.

Table 2-1 describes each of the static Char methods.

Table 2-1. Char methods

Char method

Description

IsControl

A control code in the ranges U007F, U0000-U001F, and U0080-U009F.

IsDigit

Any decimal digit in the range 0-9.

IsLetter

Any alphabetic letter.

IsNumber

Any decimal digit or hexadecimal digit.

IsPunctuation

Any punctuation character.

IsSeparator

A space separating words, a line separator, or a paragraph separator.

IsSurrogate

Any surrogate character in the range UD800-UDFFF.

IsSymbol

Any mathematical, currency, or other symbol character. Includes characters that modify surrounding characters.

IsWhitespace

Any space character and the following characters:

U0009

U000A

U000B

U000C

U000D

U0085

U2028

U2029

The following code example determines whether the fifth character (the charPosition parameter is zero-based) in the string is a digit:

if (GetCharKind("abcdefg", 4) == CharKind.Digit) {...}

See Also

See the “Char Structure” topic in the MSDN documentation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset