We’ve already used the C++ Standard Library string
class to represent strings as full-fledged objects. Chapter 21 presents class string
in detail. This section introduces C-style, pointer-based strings (as defined by the C programming language), which we’ll simply call C strings. C++’s string
class is preferred for use in new programs, because it eliminates many of the security problems and bugs that can be caused by manipulating C strings. We cover C strings here for a deeper understanding of pointers and built-in arrays, and because there are some cases (such as command-line arguments) in which C string processing is required. Also, if you work with legacy C and C++ programs, you’re likely to encounter pointer-based strings. We cover C strings in detail in Appendix F.
Characters are the fundamental building blocks of C++ source programs. Every program is composed of a sequence of characters that—when grouped together meaningfully—is interpreted by the compiler as instructions and data used to accomplish a task. A program may contain character constants. A character constant is an integer value represented as a character in single quotes. The value of a character constant is the integer value of the character in the machine’s character set. For example, 'z'
represents the integer value of z
(122 in the ASCII character set; see Appendix B), and '
'
represents the integer value of newline (10 in the ASCII character set).
A string is a series of characters treated as a single unit. A string may include letters, digits and various special characters such as +
, -
, *
, /
and $
. String literals, or string constants, in C++ are written in double quotation marks as follows:
"John Q. Doe" (a name)
"9999 Main Street" (a street address)
"Maynard, Massachusetts" (a city and state)
"(201) 555-1212" (a telephone number)
A pointer-based string is a built-in array of characters ending with a null character (' '
), which marks where the string terminates in memory. A string is accessed via a pointer to its first character. The result of sizeof
for a string literal is the length of the string including the terminating null character.
A string literal may be used as an initializer in the declaration of either a built-in array of char
s or a variable of type const
char*
. The declarations
char color[]{"blue"};
const char* colorPtr{"blue"};
each initialize a variable to the string "blue"
. The first declaration creates a five-element built-in array color
containing the characters 'b'
, 'l'
, 'u'
, 'e'
and ' '
. The second declaration creates pointer variable colorPtr
that points to the letter b
in the string "blue"
(which ends in ' '
) somewhere in memory. String literals exist for the duration of the program and may be shared if the same string literal is referenced from multiple locations in a program. String literals cannot be modified.
The declaration char
color[]
=
"blue";
could also be written
char color[]{'b', 'l', 'u', 'e', ' '};
which uses character constants in single quotes ('
) as initializers for each element of the built-in array. When declaring a built-in array of char
s to contain a string, the built-in array must be large enough to store the string and its terminating null character. The compiler determines the size of the built-in array in the preceding declaration, based on the number of initializers in the initializer list.
Not allocating sufficient space in a built-in array of char
s to store the null character that terminates a string is a logic error.
Creating or using a C string that does not contain a terminating null character can lead to logic errors.
When storing a string of characters in a built-in array of char
s, be sure that the built-in array is large enough to hold the largest string that will be stored. C++ allows strings of any length. If a string is longer than the built-in array of char
s in which it’s to be stored, characters beyond the end of the built-in array will overwrite data in memory following the built-in array, leading to logic errors and potential security breaches.
Because a C string is a built-in array of characters, we can access individual characters in a string directly with array subscript notation. For example, in the preceding declaration, color[0]
is the character 'b'
, color[2]
is 'u'
and color[4]
is the null character.
char
with cin
A string can be read into a built-in array of char
s using cin
. For example, the following statement reads a string into the built-in 20-element array of char
s named word
:
cin >> word;
The string entered by the user is stored in word
. The preceding statement reads characters until a white-space character or end-of-file indicator is encountered. The string should be no longer than 19 characters to leave room for the terminating null character. The setw
stream manipulator can be used to ensure that the string read into word
does not exceed the size of the built-in array. For example, the statement
cin >> setw(20) >> word;
specifies that cin
should read a maximum of 19 characters into word
and save the 20th location to store the terminating null character for the string. The setw
stream manipulator is not a sticky setting—it applies only to the next value being input. If more than 19 characters are entered, the remaining characters are not saved in word
, but they will be in the input stream and can be read by the next input operation. Of course, any input operation can also fail. We show how to detect input failures in Section 13.8.
char
with cin.getline
In some cases, it’s desirable to input an entire line of text into a built-in array of char
s. For this purpose, the cin
object provides the member function getline
, which takes three arguments—a built-in array of char
s in which the line of text will be stored, a length and a delimiter character. For example, the statements
char sentence[80];
cin.getline(sentence, 80, '
');
declare sentence
as a built-in array of 80 characters and read a line of text from the keyboard into the built-in array. The function stops reading characters when the delimiter character '
'
is encountered, when the end-of-file indicator is entered or when the number of characters read so far is one less than the length specified in the second argument. The last character in the built-in array is reserved for the terminating null character. If the delimiter character is encountered, it’s read and discarded. The third argument to cin.getline
has '
'
as a default value, so the preceding function call could have been written as
cin.getline(sentence, 80);
Chapter 13, Stream Input/Output: A Deeper Look, provides a detailed discussion of cin.getline
and other input/output functions.
A built-in array of char
s representing a null-terminated string can be output with cout
and <<
. The statement
cout << sentence;
displays the built-in array sentence
. Like cin
, cout
does not care how large the built-in array of char
s is. The characters are output until a terminating null character is encountered; the null character is not displayed. [Note: cin
and cout
assume that built-in arrays of char
s should be processed as strings terminated by null characters; cin
and cout
do not provide similar input and output processing capabilities for other built-in array types.]