Chapter 6
Strings
6.1 Array of Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.2 String Functions in C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.2.1 Copy: strcpy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.2.2 Compare: strcmp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.2.3 Finding Substrings: strstr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.2.4 Finding Characters: strchr .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.3 Understanding argv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.4 Counting Substrings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Strings can be created by putting characters between double quotations. For example,
“Hello”
“The C language”
“write 2 programs”
“symbols $%# can be part of a string”
A string can include alphabet characters, digits, spaces, and symbols. The examples
above are string constants, which means that their data cannot be edited. In most cases,
however, string variables are preferable to store strings whose values may change. For ex-
ample, a program may ask a user to enter a name. The program cannot know the user’s
name in advance, and thus cannot be compiled with the name. From the program’s point
of view, the name is a string variable that gets initialized when it receives the name from
the keyboard input.
6.1 Array of Characters
Because strings are commonly used, many newer languages, such as C++ and Java, have
in-built string types. C, however, does not have a specific data type for strings. Instead,
C uses arrays of characters for strings. Every string is an array of characters but an array
of characters is not necessarily a string. To be a string, one element in the array must
be the special character 0’. This character terminates the string, and is called the null
terminator. If an array has characters after the null terminator, those characters are not
part of the string. Below are four arrays of characters but only arr3 and arr4 are strings
because only those two arrays contain 0’.
char arr1 [] = { T , h , i , s , , n , v , t };1
char arr2 [] = { T , h , i , s , , s , t , r , 0 };2
char arr3 [] = { 2 , n , d , , s , t , 0 , M };3
char arr4 [] = { C , , P , , @ , - , 0 , 1 , 8 };4
85
86 Intermediate C Programming
The string in arr3 is “2nd st”. The character M is an array element but it is not part
of the string. Similarly, for arr4, the string is C P @-”. The trailing characters 1 and 8
are elements of the array but they are not part of the string. We do not need to put any
number between [ and ] because gcc calculates the size for each array.
What is the difference between single quotation marks and double quotation marks?
Single quotations enclose a single letter, such as M’ and @’, and represent a character type.
Double quotations enclose a string and the null terminator, 0’, is automatically added to
the end of the string. Thus, the string stored in arr3 is 2nd st (no 0’) but it actually
contains the element 0’. Note that “W is different from W’. The former uses double quotes
and means a string, ending with a null terminator even though it is not shown. Hence, W
actually means two characters. In contrast, W is a character without a null terminator.
To explain this in another way, when storing a string of n characters, the array needs
space for n + 1 characters. The additional character is used to store the terminating 0’.
For example, to store the string “Hello” (5 characters), we need to create an array of 6
elements:
char arr [6]; /* create an array with 6 ch a racters */1
arr [0] = H ;2
arr [1] = e ;3
arr [2] = l ;4
arr [3] = l ;5
arr [4] = o ;6
arr [5] = 0 ; /* reme mb er to add 0 */7
Forgetting the null terminator 0’ is a common mistake. The null terminator
is important because it indicates the end (and thus length) of the string. In the earlier
examples, arr3 and arr4 were two arrays; arr3 had 8 elements and arr4 had 10 elements.
However, if they are treated as strings, the length of each is only 6. The null terminator is
not counted at part of the length. C provides a function strlen for calculating the length
of strings. Before calling strlen, the program needs to include the file string.h because
strlen and many string-related functions are declared in string.h.
// strlen . c1
#in clude < stdio .h >2
#in clude < stdlib .h >3
#in clude < string .h >4
int main ( i n t argc , char * * argv )5
{6
char str1 [] = { T ,h , i ,s , , n ,v ,t };7
char str2 [] = { T ,h , i ,s , , s ,t ,r , 0 };8
char str3 [] = { 2 ,n , d , ,s , t ,0 , M };9
char str4 [] = { C , , P , ,@ , - ,0 , 1 ,8 , k };10
char str5 [6];11
int len3 ;12
int len4 ;13
int len5 ;14
str5 [0] = H ;15
str5 [1] = e ;16
str5 [2] = l ;17
str5 [3] = l ;18
str5 [4] = o ;19
str5 [5] = 0 ;20
len3 = strlen ( str3 );21
Strings 87
len4 = strlen ( str4 );22
len5 = strlen ( str5 );23
printf (" len3 = %d , len4 = %d , len5 = %d n" ,len3 ,len4 , len5 );24
return EXIT _SUCCES S ;25
}26
The output for this program is
len3 = 6, len4 = 6, len5 = 5
Why is 0’ so important? The string functions use it to determine the end of strings.
The manual of strlen says the function, “calculates the length of the string s, excluding the
terminating null byte (0’). In other words, 0’ is not counted. Although it is a simple
function, the implementation of strlen is instructive. This is one way of implementing
strlen:
int strlen ( char *str )1
{2
int length = 0;3
while ((* str ) != 0 )4
{5
length ++;6
str ++;7
}8
return ( length ) ;9
}10
Section 4.6 explains that when calling a function, the argument str stores the address of
the first array element. The sixth line increments an integer. The seventh line uses pointer
arithmetic, as explained in Section 4.8. Consider str5 and len5 only; this is the call stack
before calling strlen:
Frame Symbol Address Value
main
len5 106 garbage
str5[5] 105 0’
str5[4] 104 ’o’
str5[3] 103 ’l’
str5[2] 102 ’l’
str5[1] 101 ’e’
str5[0] 100 ’H’
Calling strlen pushes a new frame onto the call stack with the return location, the
value address, the argument str, and the local variable length:
Frame Symbol Address Value
strlen
length 110 0
str 109 100
value address 108 106
return location 107 line 23
main
len5 106 garbage
str5[5] 105 0’
str5[4] 104 ’o’
str5[3] 103 ’l’
str5[2] 102 ’l’
str5[1] 101 ’e’
str5[0] 100 ’H’
88 Intermediate C Programming
The argument str stores the address of the first array element and that address is 100.
The fourth line of strlen reads the value stored at the address and it is the character ’H’.
Since this is not a 0’, both length and str increment.
Frame Symbol Address Value
strlen
length 110 1
str 109 101
value address 108 106
return location 107 line 23
main
len5 106 garbage
str5[5] 105 0’
str5[4] 104 ’o’
str5[3] 103 ’l’
str5[2] 102 ’l’
str5[1] 101 ’e’
str5[0] 100 ’H’
The value of str is the address of the second element and it is 101. The fourth line *
str reads the value at address 101 and the value is ’e’. Since this is not 0’, both length
and str increment again. Both length and str increment until str becomes 105, and the
condition at the fourth line is false. The function returns 5, without counting 0’.
Frame Symbol Address Value
main
len5 106 garbage 5
str5[5] 105 0’
str5[4] 104 ’o’
str5[3] 103 ’l’
str5[2] 102 ’l’
str5[1] 101 ’e’
str5[0] 100 ’H’
The strlen function ignores everything after 0’, and thus the string lengths of len3
and len4 are 6 even though they have 8 and 10 elements.
6.2 String Functions in C
In addition to strlen, C provides many functions for processing strings. Each of these
functions assumes that a string has 0’ as one of the elements. Below we introduce a few
of these functions.
6.2.1 Copy: strcpy
This function copies a string into a pre-allocated memory region. This function takes
two arguments: The first is the destination and the second is the source. Here is an example:
char src [] = { H , e , l , l , o , 0 };1
char dest [6]; // must be 6 or larger2
strcpy ( dest , src );3
There are five characters in “Hello” but one element is needed for the null terminator,
0’. Thus, the destination’s size needs to be six or larger. The strcpy function does not
Strings 89
check whether the destination has enough space. You must ensure that there is enough
space at the destination. The manual for strcpy says: The strcpy() function copies the
string pointed to by src, including the terminating null byte (0’), to the buffer pointed to
by dest. The strings may not overlap, and the destination string dest must be large enough
to receive the copy.
Moreover, the manual says: If the destination string of a strcpy() is not large enough,
then anything might happen. Overflowing fixed-length string buffers is a favorite cracker
technique for taking complete control of the machine. Any time a program reads or copies
data into a buffer, the program first needs to check that there’s enough space. This may be
unnecessary if you can show that overflow is impossible, but be careful: Programs can get
changed over time, in ways that may make the impossible possible.
What does this mean? When writing a program that uses strcpy, the programmer must
ensure that the destination has enough space. If sufficient space is not made available, then
the program has a serious and unpredictable flaw. Consider a situation where a program
reads data from the keyboard. For example, it asks a user to enter the name. To handle this
situation correctly, the program must be careful about an extremely long input. If sufficient
memory is not allocated and strcpy is called, then the program has a serious security flaw,
vulnerable to “buffer overflow attacks”.
Why does C not check the memory of the destination? To improve speed. Checking would
slow down programs. When C was designed in the late 1960s, computers were expensive
and slow. To make C programs fast, programmers had to take the responsibility of ensuring
that the destination has enough space.
6.2.2 Compare: strcmp
This function can be used to compare two strings. It takes two arguments:
strcmp ( str1 , str2 ) ;1
The function returns a negative integer, a zero, or a positive integer depending on whether
str1 is less than, equal to, or greater than str2. The order of two strings is defined in the
same way as the order of words in a dictionary—also known as lexicographical order. For
example, “about” is smaller than “forever” because the letter a is before the letter f in a
dictionary. “Education” is after “Change”.
How are uppercase and lowercase letters compared? How does the function define the
order if one or both of the strings contain digits or symbols? The order is determined by
the ASCII (American Standard Code for Information Interchange) values. ASCII assigns
an integer value to each character. For example, the value for ’A’ is 65 and the value of ’a’ is
97. ASCII also assigns a value to each symbol or digit. The value for # is 35 and for digit
7 it is 55. The last statement may sound strange. Why does the value of digit 7 have a
value of 55? The simple answer is that character values are treated differently from integer
values. This can be shown using the following example:
/*1
* charint . c2
* how C treats integer and charact e r dif ferently3
*/4
#in clude < stdio .h >5
#in clude < stdlib .h >6
int main ( i n t argc , char * * argv )7
{8
int v = 55;9
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset