EXPLORATION 62

image

Old-Fashioned Arrays

Throughout this book, I’ve used std::vector for arrays. Exploration 53 touched on std::array and introduced other containers, such as std::deque. Hidden in the implementation of these types is an old-fashioned, crude, and unsafe style of array. This Exploration takes a look at this relic from C, not because I want you ever to use it, but because you may have to read code that uses this language construct, and it will help you to understand how the standard library can implement vector, array, and similar types. You may be surprised to learn that C-style arrays have much in common with pointers.

C-Style Arrays

C++ inherits from C a primitive form of array. Although an application should never have to use C-style arrays, library authors sometimes have to use them. For example, a typical implementation of std::vector makes use of C-style arrays.

The following shows how you define a C-style array object by specifying the array size in square brackets after the declarator name:

int data[10];

The array size must be a compile-time constant integer expression. The size must be strictly positive; zero-length arrays are not allowed. The compiler sets aside a single chunk of memory that is large enough to store the entire array. After the array definition, your code can use the array name as an address, not as a pointer. The elements of the array are lvalues, so, for instance, you can assign to data[0] but not to data itself.

Use square brackets to refer to elements of the array. The array index must be an integer. If the index is out of bounds, the results are undefined. Now you can see why std::vector implements the square bracket operator the way it does—namely, in imitation of C-style arrays.

int data[10];
std::vector<int> safer_data{10};
data[0] = 42;           // okay
safer_data[0] = 42;     // okay: just like a C-style array
safer_data.at(0) = 42;  // okay: safer way to access vector elements
data[10] = -1;          // error: undefined behavior
safer_data[10] = -1;    // error: also undefined behavior
safer_data.at(10) = -1; // okay: throws an exception

To initialize an array, use curly braces, just like universal initialization.

int data[10] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
point corners[2] { point{0, 1}, point{10, 20} };

When you provide initial values, you can omit the array size. The compiler uses the number of initializers as the array size.

int data[] { 1, 2, 3, 4, 5 }; // just like data[5]

If you provide an array size, you can omit trailing elements, in which case the compiler zero-initializes the remaining elements, but only if the declarator has an initializer. With no initializer, the array elements are default-initialized, that is, builtin types are uninitialized.

int values[5]{ 3, 2, 1 }; // like { 3, 2, 1, 0, 0 }
int data[10]{ };          // initialize entire array to zero

Array Limitations

One of the key limitations to a C-style array is that the array doesn’t know its own size. The compiler knows the size when it compiles the array definition, but the size is not stored with the array itself, so most uses of the array are not aware of the array size.

If you declare an array type as a function parameter, something strange happens: the compiler ignores the size and treats the array type as a pointer type. In other words, when used as a function parameter, int x[10] means exactly the same thing as int x[1], int x[100000000], and int* x. In practical terms, this means the function has no idea what the array size is. The function sees only a pointer to the start of the array. Thus, functions that take a C-style array as an argument typically have an additional argument to pass the array size, as shown in Listing 62-1.

Listing 62-1.  Array Type in Function Parameters

#include <iostream>
 
int sum(int* array, int size);
 
int main()
{
  int data[5]{ 1, 2, 3, 4, 5 };
  std::cout << sum(data, 5) << ' ';
}
 
int sum(int array[], int size)
{
  int result{0};
  while (size-- != 0)
    result += array[size];
  return result;
}

Because an array does not store its size, an extern declaration (Exploration 41) of an array doesn’t keep track of the array size. Thus, the definition of the array must specify the size (explicitly or implicitly with an initializer), but extern declarations can omit the size. Unlike function arguments and parameters, no conversions take place with extern objects. Arrays are arrays, and pointers are pointers. If the extern declaration does not match the object’s definition, the results are undefined.

Another limitation of arrays is that a function cannot return an array.

Dynamically Allocating an Array

A new expression can allocate an array, or more precisely, it allocates one or more contiguous values, default-initializes each element, and returns the address of the first one. (Remember that default-initializing a built-in type, such as int, leaves the value uninitialized.) Initialize elements of the array with values in curly braces. Remember that omitted elements are zero-initialized, so {} initializes every element of the new array. Like passing an array to a function, all you get from new is a pointer. It is up to you to keep track of the size. Pass the size in square brackets after the type.

int* ptr{ new int[10]{ 1, 2, 3 } };
ptr[0] = 9;
ptr[9] = 0;

To free the memory, use the delete[ ] operator. The square brackets are required. Do not pass the size.

delete[] ptr;

If you allocate a scalar value with new (no square brackets), you must delete the memory with plain delete (no square brackets). If you allocate an array (even if the size is one), delete the memory with delete[ ]. You cannot mix the array-style new with a non-array delete or vice versa. Because the delete operator and delete[ ] operator both take a plain pointer as an operand, the compiler cannot, in general, detect errors. A good library can detect errors and report them at runtime, but the standard provides no guarantee.

The unique_ptr type distinguishes between scalars and arrays, so you can use it to manage dynamically allocated arrays, as follows:

std::unique_ptr<int[]> array{ new int[10] };
std::unique_ptr<int> scalar{ new int };

In both cases, the expression type is int*, so there is no way for the compiler to verify that you are using the correct form of unique_ptr with a matching new expression. Enable all the debugging features in your compiler and see if it detects any errors in Listing 62-2.

Listing 62-2.  Deliberate Errors with new and delete

#include <memory>
 
int main()
{
   int* p{ new int[10] };
   delete p;
   p = new int;
   delete[] p;
   std::unique_ptr<int> up{ new int[10] };
   up.reset();
   std::unique_ptr<int[]> upa{ new int };
   upa.reset();
}
    

The array Type

Many of the limitations of C-style arrays are ameliorated by std::array. Unlike C-style arrays, std::array remembers its size. Unlike C-style arrays, a function can return std::array. Unlike C-style arrays, std::array offers bounds-checking with the at() member function. Like std::vector, you must provide an element type as a template argument, but you also supply the array size. Also, std::array can have zero size, but C-style arrays cannot. One oddity is that an array initializer requires a second set of curly braces, which is a side effect of the way std::array works.

std::array<int, 5> data{ { 1, 2, 3, 4, 5 } };
assert(data.size() == 5);
data.at(10); // throws an exception
for (auto x : data)  // supports iterators
    std::cout << x << ' ';
std::array<int, 0> look_ma_no_elements;
assert(look_ma_no_elements.size() == 0);

But C-style arrays offer one (only one!) feature lacking in std::array. What is it?

_____________________________________________________________

The compiler can count the number of initializers of a C-style array, but you must do the counting for std::array. There are some situations in which this limitation is serious, but in most cases, std::array is vastly superior to C-style arrays.

Multidimensional Arrays

All C-style arrays are one-dimensional, but you can create an array of arrays. For example, define a variable as a 3 × 4 matrix as follows:

double matrix[3][4];

Read this declaration the way you would any other. Start with the name and work your way from inside to outside: matrix is an array with three elements. Each element is an array of four elements of type double. Thus, C++ arrays are row-major—that is, the array is accessed by row, and the right-most index varies fastest. So another way to define matrix is as follows:

typedef double row[4];
row matrix[3];

When you pass a matrix to a function, only the left-most array is converted to a pointer. Thus, if you were to pass matrix to a function, you would have to declare the function parameter as a pointer to an array of four doubles, as follows:

double sum(double arg[][4]);

or:

double sum(double *arg[4]);

or:

double sum(row* arg);

To refer to elements of the matrix, use a separate subscript operator for each index, as follows:

void initialize(double matrix[][4], int num_rows)
{
   for (int i = 0; i != num_rows; ++i)
      for (int j = 0; j != 4; ++j)
         matrix[i][j] = 1.0;
}

You can also refer to an entire row: matrix[2] returns the address of the last row of the matrix, which has type double[4], which means it is the address of the first element of a four-element array of double.

If you use std::array, you can treat the matrix as an object, with all the advantages of std::array. The declaration syntax might strike you as backward, but it clearly expresses the nature of a multidimensional C array as an array of arrays, as in the following:

std::array<std::array<double, 4>, 3> matrix;

C-Style Strings

Another legacy type that C++ inherits from C is the C-style string, which is little more than a C-style array of char. (A wide C-style string is a C-style array of wchar_t. Everything in this Exploration applies equally to wide strings and wchar_t but mentions only char for the sake of simplicity. Ditto for strings of char16_t and char32_t.) A string literal in C++ is a const array of char. The size of the array is the number of elements in the array, plus one. The compiler automatically appends the character with value zero ('') to the end of the array, as a marker for the end of the string. (Remember that the array does not store the size, so the trailing zero-valued character is the only way to identify the end of the string and, therefore, the length of the string.) The zero-value character is also called a null character. In spite of the unfortunate collision of terminology, null characters have nothing to do with null pointers (Exploration 59).

The std::string class has a constructor to construct a C++ string from a C character pointer. Often, the compiler is able to call this constructor automatically, so you can usually use a string literal anywhere that calls for std::string.

Should you ever have to work with C-style strings directly, remember that a string literal contains const elements. A frequent mistake is to treat a string literal as an array of char, not an array of const char. Although you generally cannot know the amount of memory that a character array occupies, you can discover the number of characters (storage units, not logical characters, if the character set is multi-byte) by calling the std::strlen function (declared in <cstring>, along with several other functions that are useful for working with C-style strings), passing the start of the character array as an argument. (Wide characters have different support functions; for details consult a library reference.)

Command-Line Arguments

The one and only time you should use a C-style array is to access command-line arguments that the host environment passes to a program when the main() function begins. For historic reasons, the command-line arguments are passed as a C-style array of pointers to C-style character arrays. Thus, you can choose to write the main() function as a function of no arguments or a function of two arguments: an int for the number of command-line arguments and a pointer to the first element of an array of pointers to the individual command line arguments, each as an array of char (not const char). Listing 62-3 shows an example of echo, which echoes command-line arguments to the standard output. Note that the first command-line argument is the program name or path to the program’s executable file (the details are defined by the implementation). Note also that std::ostream knows how to print a C-style character pointer by printing the character contents of the string.

Listing 62-3.  Echoing Command-Line Arguments

#include <iostream>
 
int main(int argc, char* argv[])
{
  char const* separator{""};
  while (--argc != 0)
  {
    std::cout << separator << *++argv;
    separator = " ";
  }
}

The names argc and argv are conventional, not required. As with any other function parameters, you are free to pick any names you want. The second argument is of type pointer-to-pointer-to-char and is often written as char* argv[ ] to emphasize the point that it is an array of char* values, although some programmers also use char** argv, which means the same thing.

The size of the argv array is argc + 1, because its last element is a null pointer, after all the command-line arguments. Thus, some programs loop through command-line arguments by counting and comparing with argc, and others loop through argv until reaching a null pointer.

Write a program that takes two command-line arguments: an input file and an output file. The program copies the contents of the input file to the output file. Compare your solution with mine, shown in Listing 62-4.

Listing 62-4.  Copying a File Named on the Command Line

#include <cstdio>
#include <cstdlib>
#include <fstream>
#include <iostream>
 
int main(int argc, char* argv[])
{
  if (argc != 3)
  {
    std::cerr << "usage: " << argv[0] << " INPUT OUTPUT ";
    return EXIT_FAILURE;
  }
  std::ifstream input{argv[1]};
  if (not input)
  {
    std::perror(argv[1]);
    return EXIT_FAILURE;
  }
  std::ofstream output{argv[2]};
  if (not output)
  {
    std::perror(argv[2]);
    return EXIT_FAILURE;
  }
 
  input.exceptions(input.badbit);    // throw for serious errors
  output.exceptions(output.failbit); // throw for any error
 
  try
  {
    // Lots of ways to copy: use std::copy, use a loop to read & write
    // The following is a little-known technique that is probably fastest.
    output << input.rdbuf();
    output.close();
    input.close();
  }
  catch (std::ios_base::failure const& ex)
  {
    std::cerr << "Can’t copy " << argv[1] << " to " << argv[2] << ": " <<
             ex.what() << ' ';
    return EXIT_FAILURE;
  }
}

Pointer Arithmetic

An unusual feature of C++ pointers (inherited from C) is that you can perform addition and subtraction on pointers. In ordinary usage, these operations work only on pointers that point into arrays. Specifically, you can add or subtract integers and pointers, and you can subtract two pointers to get an integer. You can also compare two pointers, using relational operators (less than, greater than, etc.). This section explores what these operations mean.

Briefly, a pointer can point to any object in an array. Add an integer to a pointer to obtain the address of an element of the array. For example, array + 2 points to the third element of the array: the element at index 2. You are allowed to form a pointer to any position in the array, including the position one past the end. Given a pointer into the array, subtract an integer to obtain the address of an element earlier in the array. You are not allowed to form an address that precedes the first element of the array.

Subtract two pointers to obtain the number of array elements that separate them. They must be pointers that point into the same array. When you compare two pointers using relational operators, pointer a is “less than” pointer b if a and b both point to the same array and a comes earlier in the array than b.

Ordinarily, you have no reason to use the relational operators on pointers that are not in the same array. But you can use pointers as keys in sets and maps, and these types have to compare pointers to put the keys in order. The std::set and std::map templates use std::less to compare keys (Exploration 50), and std::less uses the < operator. The details are specific to the implementation, but the standard requires std::less to work with all pointers, thereby ensuring that sets and maps work properly when you use pointers as keys.

The compiler and library are not required to enforce the rule that pointers must point to the same array or stay confined to legal indices. Some compilers might try to give you a few warnings, but in general, the compiler cannot tell whether your program follows all the rules. If your program does not follow the rules, it enters the twilight zone of undefined behavior. That’s what makes pointers so dangerous: it’s easy to fall into undefined behavior territory.

The most common use for pointer arithmetic is to advance through the elements of an array by marching a pointer from the beginning of the array to the end, instead of using an array index. Listing 62-5 illustrates this idiom as well as pointer subtraction, by showing one possible implementation of the standard std::strlen function, which returns the length of a C-style string.

Listing 62-5.  Using Pointer Arithmetic to Determine the Length of a C String

#include <cstddef>
 
std::size_t my_std_strlen(char const* str)
{
   char const* start{str};      // remember the start of the string
   while (*str != 0)            // while not at the end of the string
      ++str;                    // advance to the next character
   return str - start;          // compute string length by subtracting pointers
}

Pointer arithmetic is error-prone, dangerous, and I recommend avoiding it. Instead of C strings, for example, use std::string. Instead of C-style arrays, use std::vector or std::array.

However, pointer arithmetic is a common idiom in C++ programs and, therefore, unavoidable. Pointer arithmetic is especially prevalent in library implementations. For example, I can almost guarantee that it is used in your library’s implementation of the string, vector, and array class templates. Thus, library authors must be especially vigilant against errors that are difficult or impossible for the compiler to detect, but that effort pays off by making a safer interface available to all other developers.

In the interest of making pointers safer, C++ lets you define a class that looks, acts, and smells like a pointer type but with bonus features, such as additional checks and safety. These so-called smart pointers are the subjects of the next Exploration.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset