21
WRITING APPLICATIONS

For a bunch of hairless apes, we’ve actually managed to invent some pretty incredible things.
—Ernest Cline
, Ready Player One

Image

This chapter contains a potpourri of important topics that will add to your practical understanding of C++ by teaching you the basics of building real-world applications. It begins with a discussion of program support built into C++ that allows you to interact with the application life cycle. Next, you’ll learn about Boost ProgramOptions, an excellent library for developing console applications. It contains facilities to accept input from users without your having to reinvent the wheel. Additionally, you’ll learn some special topics about the preprocessor and compiler that you’ll likely come across when building an application whose source exceeds a single file.

Program Support

Sometimes your programs need to interact with your operating environment’s application life cycle. This section covers three major categories of such interactions:

  • Handling program termination and cleanup
  • Communicating with the environment
  • Managing operating system signals

To help illustrate the various facilities in this section, you’ll use Listing 21-1 as a framework. It uses a spruced up analog to the Tracer class from Listing 4-5 in Chapter 4 to help track which objects get cleaned up in various program termination scenarios.

#include <iostream>
#include <string>

struct Tracer { 
  Tracer(std::string name_in)
    : name{ std::move(name_in) } {
    std::cout << name << " constructed.
";
  }
  ~Tracer() {
    std::cout << name << " destructed.
";
  }
private:
  const std::string name;
};

Tracer static_tracer{ "static Tracer" }; 

void run() { 
  std::cout << "Entering run()
";
  // ...
  std::cout << "Exiting run()
";
}

int main() {
  std::cout << "Entering main()
"; 
  Tracer local_tracer{ "local Tracer" }; 
  thread_local Tracer thread_local_tracer{ "thread_local Tracer" }; 
  const auto* dynamic_tracer = new Tracer{ "dynamic Tracer" }; 
  run(); 
  delete dynamic_tracer; 
  std::cout << "Exiting main()
"; 
}
-----------------------------------------------------------------------
static Tracer constructed. 
Entering main() 
local Tracer constructed. 
thread_local Tracer constructed. 
dynamic Tracer constructed. 
Entering run() 
Exiting run() 
dynamic Tracer destructed. 
Exiting main() 
local Tracer destructed. 
thread_local Tracer destructed. 
static Tracer destructed. 

Listing 21-1: A framework for investigating program termination and cleanup facilities

First, you declare a Tracer class that accepts an arbitrary std::string tag and reports to stdout when the Tracer object is constructed and destructed . Next, you declare a Tracer with static storage duration . The run function reports when the program has entered and exited it . In the middle is a single comment that you’ll replace with other code in the sections that follow. Within main, you make an announcement ; initialize Tracer objects with local , thread-local , and dynamic storage duration; and invoke run . Then you delete the dynamic Tracer object and announce that you’re about to return from main .

WARNING

If any of the Listing 21-1 output is surprising, please review “An Object’s Storage Duration” on page 89 before proceeding!

Handling Program Termination and Cleanup

The <cstdlib> header contains several functions for managing program termination and resource cleanup. There are two broad categories of program termination functions:

  • Those that cause program termination
  • Those that register a callback when termination is about to happen
Termination Callback with std::atexit

To register a function to be called when normal program termination occurs, you use the std::atexit function. You can register multiple functions, and they’ll be called in reverse order from their registration. The callback functions take no arguments and return void. If std::atexit registers a function successfully, it will return a non-zero value; otherwise, it returns zero.

Listing 21-2 illustrates that you can register an atexit callback and it will be called at the expected moment.

#include <cstdlib>
#include <iostream>
#include <string>

struct Tracer {
--snip--
};

Tracer static_tracer{ "static Tracer" };


void run() {
  std::cout << "Registering a callback
"; 
  std::atexit([] { std::cout << "***std::atexit callback executing***
"; }); 
  std::cout << "Callback registered
"; 
}

int main() {
--snip--
}
-----------------------------------------------------------------------
static Tracer constructed.
Entering main()
local Tracer constructed.
thread_local Tracer constructed.
dynamic Tracer constructed.
Registering a callback
Callback registered 
dynamic Tracer destructed.
Exiting main()
local Tracer destructed.
thread_local Tracer destructed.
***std::atexit callback executing*** 
static Tracer destructed.

Listing 21-2: Registering an atexit callback

Within run, you announce that you’re about to register a callback , you do it , and then you announce that you’re about to return from run . In the output, you can plainly see that the callback occurs after you’ve returned from main and all the non-static objects have destructed.

There are two important admonitions when programming a callback function:

  • You must not throw an uncaught exception from the callback function. Doing so will cause std::terminate to get invoked.
  • You need to be very careful interacting with non-static objects in your program. The atexit callback functions execute after main returns, so all local, thread local, and dynamic objects will be destroyed at that point unless you take special care to keep them alive.

WARNING

You can register at least 32 functions with std::atexit, although the exact limit is implementation defined.

Exiting with std::exit

Throughout the book, you’ve been terminating programs by returning from main. In some circumstances, such as in multithreaded programs, you might want to exit the program gracefully in some other way, although you should avoid introducing the associated complications. You can use the std::exit function, which accepts a single int corresponding to the program’s exit code. It will perform the following cleanup steps:

  1. Thread-local objects associated with the current thread and static objects get destroyed. Any atexit callback functions get called.
  2. All of stdin, stdout, and stderr get flushed.
  3. Any temporary files get removed.
  4. The program reports the given status code to the operating environment, which resumes control.

Listing 21-3 illustrates the behavior of std::exit by registering an atexit callback and invoking exit from within run.

#include <cstdlib>
#include <iostream>
#include <string>

struct Tracer {
--snip--
};

Tracer static_tracer{ "static Tracer" };

void run() {
  std::cout << "Registering a callback
"; 
  std::atexit([] { std::cout << "***std::atexit callback executing***
"; }); 
  std::cout << "Callback registered
"; 
  std::exit(0); 
}

int main() {
--snip--
}
-----------------------------------------------------------------------
static Tracer constructed.
Entering main()
local Tracer constructed.
thread_local Tracer constructed.
dynamic Tracer constructed.
Registering a callback 
Callback registered 
thread_local Tracer destructed.
***std::atexit callback executing*** 
static Tracer destructed.

Listing 21-3: Invoking std::exit

Within run, you announce that you’re registering a callback , you register one with atexit , you announce that you’ve completed registering , and you invoke exit with argument zero . Compare the program output from Listing 21-3 to the output from Listing 21-2. Notice that the following lines don’t appear:

dynamic Tracer destructed.
Exiting main()
local Tracer destructed.

According to the rules for std::exit, local variables on the call stack don’t get cleaned up. And of course, because the program never returns to main from run, delete never gets called. Ouch.

This example highlights an important consideration: you shouldn’t use std::exit to handle normal program execution. It’s mentioned here for completeness, because you might see it in earlier C++ code.

NOTE

The <cstdlib> header also includes a std::quick_exit, which invokes callbacks that you register with std::at_quick_exit, which has a similar interface to std::atexit. The main difference is that at_quick_exit callbacks won’t execute unless you explicitly invoke quick_exit, whereas atexit callbacks will always execute when the program is about to exit.

std::abort

To end a program, you also have a nuclear option by using std::abort. This function accepts a single integer-valued status code and immediately returns it to the operating environment. No object destructors get called and no std::atexit callbacks get invoked. Listing 21-4 illustrates how to use std::abort.

#include <cstdlib>
#include <iostream>
#include <string>

struct Tracer {
--snip--
};

Tracer static_tracer{ "static Tracer" };

void run() {
  std::cout << "Registering a callback
"; 
  std::atexit([] { std::cout << "***std::atexit callback executing***
"; }); 
  std::cout << "Callback registered
"; 
  std::abort(); 
}

int main() {
  --snip--
}
-----------------------------------------------------------------------
static Tracer constructed.
Entering main()
local Tracer constructed.
thread_local Tracer constructed.
dynamic Tracer constructed.
Registering a callback
Callback registered

Listing 21-4: Calling std::abort

Within run, you again announce that you’re registering a callback , you register one with atexit and you announce that you’ve completed registering . This time, you invoke abort instead . Notice that no output prints after you announce that you’ve completed callback registration . The program doesn’t clean up any objects, and your atexit callback doesn’t get called.

As you might imagine, there aren’t too many canonical uses for std::abort. The main one you’re likely to encounter is the default behavior of std::terminate, which gets called when two exceptions are in flight at once.

Communicating with the Environment

Sometimes, you might want to spawn another process. For example, Google’s Chrome Browser launches many processes to service a single browser session. This builds in some security and robustness by piggybacking the operating system’s process model. Web apps and plug-ins, for example, run in separate processes, so if they crash, the entire browser doesn’t crash. Also, by running the browser’s rendering engine in a separate process, any security vulnerabilities become more difficult to exploit because Google locks down that process’s permissions in what is known as a sandboxed environment.

std::system

You can launch a separate process with the std::system function in the <cstdlib> header, which accepts a C-style string corresponding to the command you want to execute and returns an int corresponding to the return code from the command. The actual behavior depends on the operating environment. For example, the function will call cmd.exe on a Windows machine and /bin/sh on a Linux machine. This function blocks while the command is still executing.

Listing 21-5 illustrates how to use std::system to ping a remote host. (You’ll need to update the contents of command to a relevant command for your operating system if you’re not using a Unix-like operating system.)

#include <cstdlib>
#include <iostream>
#include <string>

int main() {
  std::string command{ "ping -c 4 google.com" }; 
  const auto result = std::system(command.c_str()); 
  std::cout << "The command '" << command
            << "' returned " << result << "
";
}
-----------------------------------------------------------------------
PING google.com (172.217.15.78): 56 data bytes
64 bytes from 172.217.15.78: icmp_seq=0 ttl=56 time=4.447 ms
64 bytes from 172.217.15.78: icmp_seq=1 ttl=56 time=12.162 ms
64 bytes from 172.217.15.78: icmp_seq=2 ttl=56 time=8.376 ms
64 bytes from 172.217.15.78: icmp_seq=3 ttl=56 time=10.813 ms

--- google.com ping statistics ---
4 packets transmitted, 4 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 4.447/8.950/12.162/2.932 ms
The command 'ping -c 4 google.com' returned 0 

Listing 21-5: Using std::system to invoke the ping utility (Output is from macOS Mojave version 10.14.)

First, you initialize a string called command containing ping -c 4 google.com . You then invoke std::system by passing the contents of command . This causes the operating system to invoke the ping command with the argument -c 4, which specifies four pings, and the address google.com. Then you print a status message reporting the return value from std::system .

std::getenv

Operating environments usually have environment variables, which users and developers can set to help programs find important information that the programs need to run. The <cstdlib> header contains the std::getenv function, which accepts a C-style string corresponding to the name of the environment variable you want to look up, and it returns a C-style string with the contents of the corresponding variable. If no such variable is found, the function returns nullptr instead.

Listing 21-6 illustrates how to use std::getenv to obtain the path variable, which contains a list of directories containing important executable files.

#include <cstdlib>
#include <iostream>
#include <string>

int main() {
  std::string variable_name{ "PATH" }; 
  std::string result{ std::getenv(variable_name.c_str()) }; 
  std::cout << "The variable " << variable_name
            << " equals " << result << "
"; 
}
-----------------------------------------------------------------------
The variable PATH equals /usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin

Listing 21-6: Using std::getenv to retrieve the path variable (Output is from macOS Mojave version 10.14.)

First, you initialize a string called variable_name containing PATH . Next, you store the result of invoking std::getenv with PATH into a string called result . Then you print the results to stdout .

Managing Operating System Signals

Operating system signals are asynchronous notifications sent to processes that notify the program that an event occurred. The <csignal> header contains six macro constants that represent different signals from the operating system to the program (these signals are operating system agnostic):

  • SIGTERM represents a termination request.
  • SIGSEGV represents invalid memory access.
  • SIGINT represents an external interrupt, such as a keyboard interrupt.
  • SIGILL represents an invalid program image.
  • SIGABRT represents an abnormal termination condition, such as std::abort.
  • SIGFPE represents a floating-point error, such as division by zero.

To register a handler for one of these signals, you use the std::signal function in the <csignal> header. It accepts a single int value corresponding to one of the listed signal macros as its first argument. Its second argument is a function pointer (not a function object!) to a function that accepts an int corresponding to the signal macro and returning void. This function must have C linkage (although most implementations also permit C++ linkage). You’ll learn about C linkage later in the chapter. For now, simply prepend extern "C" to your function definition. Notice that, due to the asynchronous nature of the interrupts, any accesses to a global, mutable state must be synchronized.

Listing 21-7 contains a program that waits for a keyboard interrupt.

#include <csignal>
#include <iostream>
#include <chrono>
#include <thread>
#include <atomic>

std::atomic_bool interrupted{}; 

extern "C" void handler(int signal) {
  std::cout << "Handler invoked with signal " << signal << ".
"; 
  interrupted = true; 
}

int main() {
  using namespace std::chrono_literals;
  std::signal(SIGINT, handler); 
  while(!interrupted) { 
    std::cout << "Waiting..." << std::endl; 
    std::this_thread::sleep_for(1s);
  }
  std::cout << "Interrupted!
"; 
}
-----------------------------------------------------------------------
Waiting...
Waiting...
Waiting...
Handler invoked with signal 2.
Interrupted! 

Listing 21-7: Registering for keyboard interrupts with std::signal

You first declare an atomic_bool called interrupted that stores whether the program has received a keyboard interrupt (it has static storage duration because you cannot use function objects with std::signal and therefore must use a non-member function to handle the callback). Next, you declare a callback handler that accepts an int called signal, prints its value to stdout , and sets interrupted to true .

Within main, you set the signal handler for the SIGINT interrupt code to handler . Within a loop, you wait for the program to get interrupted by printing a message and sleeping for a second . Once the program has been interrupted, you print a message and return from main .

NOTE

Typically, you can cause a keyboard interrupt on modern operating systems by pressing CTRL-C.

Boost ProgramOptions

Most console applications accept command line parameters. As you learned in “The Three main Overloads” on page 272, you can define main to accept the parameters argc and argv, which the operating environment will populate with the number of arguments and argument contents, respectively. You can always parse these manually and modify your program’s behavior accordingly, but there’s a better way: the Boost ProgramOptions library is an essential ingredient for writing console applications.

NOTE

All the Boost ProgramOptions classes presented in this section are available in the <boost/program_options.hpp> header.

You might be tempted to write your own argument-parsing code, but ProgramOptions is a smarter choice for four reasons:

  1. It’s far more convenient. Once you learn the succinct, declarative syntax of ProgramOptions, you can easily describe fairly complicated console interfaces in a few lines of code.
  2. It handles errors effortlessly. When the user misuses your program, ProgramOptions tells the user how they misused the program without any additional effort on your part.
  3. It automatically generates a help prompt. Based on your declarative markup, ProgramOptions creates nicely formatted, easy to employ documentation on your behalf.
  4. It grows beyond the command line. If you want to draw configuration from config files or environment variables, it’s easy to transition from command line arguments.

ProgramOptions comprises three parts:

  1. The options description allows you to specify the allowed options.
  2. The parsers component extracts option names and values from the command line, config files, and environment variables.
  3. The storage component provides you with the interface to access typed options.

In the subsections that follow, you’ll learn about each of these parts.

The Options Description

Three main classes comprise the options description component:

  • boost::program_options::option_description describes a single option.
  • boost::program_options::value_semantic knows the desired type of a single option.
  • boost::program_options::options_description is a container for multiple objects of type option_description.

You construct an options_description to, unsurprisingly, specify a description for the program’s options. Optionally, you can include a single string argument in the constructor that describes your program. This will print in the description if you include it, but it will have no functional impact. Next, you use its add_options method, which returns a special kind of object of type boost::program_options::options_description_easy_init. This class has a special operator() that accepts at least two arguments.

The first argument is the name of the option you want to add. ProgramOptions is very smart, so you can provide a long name and a short name separated by a comma. For example, if you had an option called threads, ProgramOptions would bind the parameter --threads from the command line to this option. If instead you named the option threads,t, ProgramOptions would bind either --threads or -t to your option.

The second argument is the description of the option. You can employ a value_semantic, a C-style string description, or both. Because options_description_easy_init returns a reference to itself from operator(), you can chain these calls together to form a succinct representation of your program’s options. Typically, you don’t create value_semantic objects directly. Instead, you use the convenience template function boost::program_options::value to generate them. It accepts a single template parameter corresponding to the desired type of the option. The resulting pointer points to an object that has code to parse text input (from the command line, for example) into the desired type. To specify an option of int type, for example, you would invoke value<int>().

The resulting pointed-to object will have several methods that allow you to specify additional information about the option. For example, you can employ the default_value method to set the option’s default value. To specify that an option of int type should default to 42, you would use the following construction:

value<int>()->default_value(42)

Another common pattern is an option that can take multiple tokens. Such options are allowed to have spaces between elements, and they’ll be parsed into a single string. To allow this, simply use the multitoken method. For example, to specify that an option can take multiple std::string values, you would use the following construction:

value<std::string>()->multitoken()

If instead you want to allow multiple instances of the same option, you can specify a std::vector as a value, like this:

value<std::vector<std::string>>()

If you have a Boolean option, you’ll use the convenience function boost::program_options::bool_switch, which accepts a pointer to a bool. If a user includes the corresponding option, the function will set the pointed-to bool to true. For example, the following construction will set a bool called flag to true if the corresponding option is included:

bool_switch(&flag)

The options_description class supports operator<<, so you can create a nicely formatted help dialog without any additional effort. Listing 21-8 illustrates how to use ProgramOptions to create a program_options object for a sample program called mgrep.

#include <boost/program_options.hpp>
#include <iostream>
#include <string>

int main(int argc, char** argv) {
  using namespace boost::program_options;
  bool is_recursive{}, is_help{};

  options_description description{ "mgrep [options] pattern path1 path2 ..."
}; 
  description.add_options()
          ("help,h", bool_switch(&is_help), "display a help dialog") 
          ("threads,t", value<int>()->default_value(4),
                        "number of threads to use") 
          ("recursive,r", bool_switch(&is_recursive),
                          "search subdirectories recursively") 
          ("pattern", value<std::string>(), "pattern to search for") 
          ("paths", value<std::vector<std::string>>(), "path to search"); 
  std::cout << description; 
}
-----------------------------------------------------------------------
mgrep [options] pattern path1 path2 ...:
  -h [ --help ]             display a help dialog
  -t [ --threads ] arg (=4) number of threads to use
  -r [ --recursive ]        search subdirectories recursively
  --pattern arg             pattern to search for
  --path arg                path to search

Listing 21-8: Using Boost ProgramOptions to generate a nicely formatted help dialog

First, you initialize an options_description object using a custom usage string . Next, you invoke add_options and begin adding options: a Boolean flag indicating whether to display a help dialog , an int indicating how many threads to use , another Boolean flag indicating whether to search subdirectories in a recursive manner , a std::string indicating which pattern to search for within files , and a list of std::string values corresponding to the paths to search . You then write the description to stdout .

Suppose that your yet to be implemented mgrep program will always require a pattern and a paths argument. You could convert these into positionalarguments, which as their name implies will assign arguments based on their position. To do this, you employ the boost::program_options::positional_options_description class, which doesn’t take any constructor arguments. You use the add method, which takes two arguments: a C-style string corresponding to the option you want to convert to positional and an int corresponding to the number of arguments you want to bind to it. You can invoke add multiple times to add multiple positional arguments. But the order matters. Positional arguments will bind from left to right, so your first add invocation applies to the left positional arguments. For the last positional option, you can use the number -1 to tell ProgramOptions to bind all remaining elements to the corresponding option.

Listing 21-9 provides a snippet that you could append into main in Listing 21-7 to add the positional arguments.

  positional_options_description positional; 
  positional.add("pattern", 1); 
  positional.add("path", -1); 

Listing 21-9: Adding positional arguments to Listing 21-8

You initialize a positional_options_description without any constructor arguments . Next, you invoke add and pass the arguments pattern and 1, which will bind the first positional option to the pattern option . You invoke add again, this time passing the arguments path and -1 , which will bind the remaining positional options to the path option.

Parsing Options

Now that you’ve declared how your program accepts options, you can parse user input. It’s possible to take configuration from environment variables, configuration files, and the command line. For brevity, this section only discusses the last.

NOTE

For information on how to obtain configuration from environment variables and configuration files, refer to the Boost ProgramOptions documentation, especially the tutorial.

To parse command line input, you use the boost::program_options::command_line_parser class, which accepts two constructor parameters arguments: an int corresponding to argc, the number of arguments on the command line, and a char** corresponding to argv, the value (or content) of the arguments on the command line. This class offers several important methods that you’ll use to declare how the parser should interpret user input.

First, you’ll invoke its options method, which takes a single argument corresponding to your options_description. Next, you’ll use the positional method, which takes a single argument corresponding to your positional_options_description. Finally, you’ll invoke run without any arguments. This causes the parser to parse the command line input and return a parsed_options object.

Listing 21-10 provides a snippet that you could append into main after Listing 21-8 to incorporate a command_line_parser.

command_line_parser parser{ argc, argv }; 
parser.options(description); 
parser.positional(positional); 
auto parsed_result = parser.run(); 

Listing 21-10: Adding the command_line_parser to Listing 21-8

You initialize a command_line_parser called parser by passing in the arguments from main . Next, you pass the options_description object to the options method and the positional_options_description to the positional method . Then you invoke the run method to produce your parsed_options object .

WARNING

If the user passes input that doesn’t parse, for example, because they provide an option that isn’t part of your description, the parser will throw an exception that inherits from std::exception.

Storing and Accessing Options

You store program options into a boost::program_options::variables_map class, which takes no arguments in its constructor. To place your parsed options into a variables_map, you use the boost::program_options::store method, which takes a parsed_options object as its first argument and a variables_map object as its second argument. Then you call the boost::program_options::notify method, which takes a single variables_map argument. At this point, your variables_map contains all the options your user has specified.

Listing 21-11 provides a snippet that you could append into main after Listing 21-10 to parse results into a variables_map.

variables_map vm; 
store(parsed_result, vm); 
notify(vm); 

Listing 21-11: Storing results into a variables_map

You first declare a variables_map . Next, you pass your parsed_result from Listing 21-10 and your newly declared variables_map to store . Then you call notify on your variables_map .

The variables_map class is an associative container that is essentially similar to a std::map<std::string, boost::any>. To extract an element, you use operator[] by passing the option name as the key. The result is a boost::any, so you’ll need to convert it to the correct type using its as method. (You learned about boost::any in “any” on page 378.) It’s crucial to check for any options that might be empty by using the empty method. If you fail to do so and you cast the any anyway, you’ll get a runtime error.

Listing 21-12 provides a snippet that you could append into main after Listing 21-10 to parse results into a variables_map.

if (is_help) std::cout << "Is help.
"; 
if (is_recursive) std::cout << "Is recursive.
"; 
std::cout << "Threads: " << vm["threads"].as<int>() << "
"; 
if (!vm["pattern"].empty()) { 
  std::cout << "Pattern: " << vm["pattern"].as<std::string>() << "
"; 
} else {
  std::cout << "Empty pattern.
";
}
if (!vm["path"].empty()) { 
  std::cout << "Paths:
";
  for(const auto& path : vm["path"].as<std::vector<std::string>>()) 
    std::cout << "	" << path << "
";
} else {
  std::cout << "Empty path.
";
}

Listing 21-12: Retrieving values from a variables_map

Because you use the bool_switch value for the help and recursive options, you simply use those Boolean values directly to determine whether the user has requested either . Because threads has a default value, you don’t need to make sure that it’s empty, so you can extract its value using as<int> directly . For those options without defaults, such as pattern, you first check for empty . If those options aren’t empty, you can extract their values using as<std::string> . You do the same for path , which allows you extract the user-provided collection with as<std::vector<std::string>> .

Putting It All Together

Now you have all the requisite knowledge to assemble a ProgramOptions-based application. Listing 21-13 illustrates one way to stitch the previous listings together.

#include <boost/program_options.hpp>
#include <iostream>
#include <string>

int main(int argc, char** argv) {
  using namespace boost::program_options;
  bool is_recursive{}, is_help{};

  options_description description{ "mgrep [options] pattern path1 path2 ..." };
  description.add_options()
          ("help,h", bool_switch(&is_help), "display a help dialog")
          ("threads,t", value<int>()->default_value(4),
                        "number of threads to use")
          ("recursive,r", bool_switch(&is_recursive),
                         "search subdirectories recursively")
          ("pattern", value<std::string>(), "pattern to search for")
          ("path", value<std::vector<std::string>>(), "path to search");

  positional_options_description positional;
  positional.add("pattern", 1);
  positional.add("path", -1);

  command_line_parser parser{ argc, argv };
  parser.options(description);
  parser.positional(positional);

  variables_map vm;
  try {
    auto parsed_result = parser.run(); 
    store(parsed_result, vm);
    notify(vm);
  } catch (const std::exception& e) {
    std::cerr << e.what() << "
";
    return -1;
  }

  if (is_help) { 
    std::cout << description;
    return 0;
  }
  if (vm["pattern"].empty()) { 
    std::cerr << "You must provide a pattern.
";
    return -1;
  }
  if (vm["path"].empty()) { 
    std::cerr << "You must provide at least one path.
";
    return -1;
  }
  const auto threads = vm["threads"].as<int>();
  const auto& pattern = vm["pattern"].as<std::string>();
  const auto& paths = vm["path"].as<std::vector<std::string>>();
  // Continue program here ... 
  std::cout << "Ok." << std::endl;
}

Listing 21-13: A complete command line parameter-parsing application using the previous listings

The first departure from the previous listings is that you wrap the call to run on your parser using a try-catch block to mitigate erroneous input provided by the user . If they do provide erroneous input, you simply catch the exception, print the error to stderr, and return.

Once you declare your program options and store them, as in Listings 21-8 to 21-12, you first check whether the user has requested a help prompt . If so, you simply print the usage and exit, because there’s no need to perform any further checking. Next, you perform some error checking to make sure the user has provided a pattern and at least one path . If not, you print an error along with the program’s correct usage and exit; otherwise, you can continue writing your program .

Listing 21-14 shows various outputs from your program, which is compiled into the binary mgrep.

$ ./mgrep 
You must provide a pattern.
$ ./mgrep needle 
You must provide at least one path.
$ ./mgrep --supercharge needle haystack1.txt haystack2.txt 
unrecognised option '--supercharge'
$ ./mgrep --help 
mgrep [options] pattern path1 path2 ...:
  -h [ --help ]             display a help dialog
  -t [ --threads ] arg (=4) number of threads to use
  -r [ --recursive ]        search subdirectories recursively
  --pattern arg             pattern to search for
  --path arg                path to search
$ ./mgrep needle haystack1.txt haystack2.txt haystack3.txt 
Ok.
$ ./mgrep --recursive needle haystack1.txt 
Ok.
$ ./mgrep -rt 10 needle haystack1.txt haystack2.txt 
Ok.

Listing 21-14: Various invocations and outputs from the program in Listing 21-13

The first three invocations return errors for different reasons: you haven’t provided a pattern , you haven’t provided a path , or you provided an unrecognized option .

In the next invocation, you get the friendly help dialog because you provided the --help option . The final three invocations parse correctly because all contain a pattern and at least one path. The first contains no options , the second uses the longhand option syntax , and the third uses the shorthand option syntax .

Special Topics in Compilation

This section explains several important preprocessor features that will help you understand the double-inclusion problem, which is described in the following subsection, and how to solve it. You’ll learn about different options for optimizing your code by using compiler flags. Additionally, you’ll learn how to allow your linker to interoperate with C using a special language keyword.

Revisiting the Preprocessor

The preprocessor is a program that applies simple transformations to source code before compilation. You give instructions to the preprocessor using preprocessor directives. All preprocessor directives begin with a hash mark (#). Recall from “The Compiler Tool Chain,” on page 5 that #include is a preprocessor directive that tells the preprocessor to copy and paste the contents of the corresponding header directly into the source code.

The preprocessor also supports other directives. The most common is the macro, which is a fragment of code that’s been given a name. Whenever you use that name within C++ code, the preprocessor replaces that name with the contents of the macro.

The two different kinds of macros are object-like and function-like. You declare an object-like macro using the following syntax:

#define <NAME> <CODE>

where NAME is the name of the macro and CODE is the code to replace that name. For example, Listing 21-15 illustrates how to define a string literal to a macro.

#include <cstdio>
#define MESSAGE "LOL" 

int main(){
  printf(MESSAGE); 
}
-----------------------------------------------------------------------
LOL

Listing 21-15: A C++ program with an object-like macro

You define the macro MESSAGE to correspond with the code "LOL" . Next, you use the MESSAGE macro as the format string to printf . After the preprocessor has completed work on Listing 21-15, it appears as Listing 21-16 to the compiler.

#include <cstdio>

int main(){
  printf("LOL");
}

Listing 21-16: The result of preprocessing Listing 21-15

The preprocessor is nothing more than a copy-and-paste tool here. The macro disappears, and you’re left with a simple program that prints LOL to the console.

NOTE

If you want to inspect the work that the preprocessor does, compilers usually have a flag that will limit compilation to just the preprocessing step. This will cause the compiler to emit the preprocessed source file corresponding to each translation unit. On GCC, Clang, and MSVC, for example, you can use the -E flag.

A function-like macro is just like an object-like macro except it can take a list of parameters after its identifier:

#define <NAME>(<PARAMETERS>) <CODE>

You can use these PARAMETERS within the CODE, allowing the user to customize the macro’s behavior. Listing 21-17 contains the function-like macro SAY_LOL_WITH.

#include <cstdio>
#define SAY_LOL_WITH(fn) fn("LOL") 

int main() {
  SAY_LOL_WITH(printf); 
}

Listing 21-17: A C++ program with a function-like macro

The SAY_LOL_WITH macro accepts a single parameter named fn . The preprocessor pastes the macro into the expression fn("LOL"). When it evaluates SAY_LOL_WITH, the preprocessor pastes printf into the expression , yielding a translation unit just like Listing 21-16.

Conditional Compilation

The preprocessor also offers conditional compilation, a facility that provides basic if-else logic. Several flavors of conditional compilation are available, but the one you’re likely to encounter is illustrated in Listing 21-18.

#ifndef MY_MACRO 
// Segment 1 
#else
// Segment 2 
#endif

Listing 21-18: A C++ program with a conditional compilation

If MY_MACRO isn’t defined at the point where the preprocessor evaluates #ifndef , Listing 21-18 reduces to the code represented by // Segment 1 . If MY_MACRO is #defined, Listing 21-18 evaluates to the code represented by // Segment 2 . The #else is optional.

Double Inclusion

Aside from using #include, you should use the preprocessor as little as possible. The preprocessor is extremely primitive and will cause difficult-to-debug errors if you lean on it too heavily. This is evident with #include, which is a simple copy-and-paste command.

Because you can define a symbol only once (a rule appropriately called the one-definition rule), you must ensure that your headers don’t attempt to redefine symbols. The easiest way to make this mistake is by including the same header twice, which is called the double-inclusion problem.

The usual way to avoid the double-inclusion problem is to use conditional compilation to make an include guard. The include guard detects whether a header has been included before. If it has, it uses conditional compilation to empty the header. Listing 21-19 illustrates how to put include guards around a header.

// step_function.h
#ifndef STEP_FUNCTION_H 
int step_function(int x);
#define STEP_FUNCTION_H 
#endif

Listing 21-19: A step_function.h updated with include guards

The first time that the preprocessor includes step_function.h in a source file, the macro STEP_FUNCTION_H won’t be defined, so #ifndef yields the code up to #endif. Within this code, you #define the STEP_FUNCTION_H macro . This ensures that if the preprocessor includes step_function.h again, #ifndef STEP_FUNCTION_H will evaluate to false and no code will get generated.

Include guards are so ubiquitous that most modern tool chains support the #pragma once special syntax. If one of the supporting preprocessors sees this line, it will behave as if the header has include guards. This eliminates quite a bit of ceremony. Using this construct, you could refactor Listing 21-19 into Listing 21-20.

#pragma once 
int step_function(int x);

Listing 21-20: A step_function.h updated with #pragma once

All you’ve done here is start the header with #pragma once , which is the preferred method. As a general rule, start every header with #pragma once.

Compiler Optimization

Modern compilers can perform sophisticated transformations on code to increase runtime performance and reduce binary size. These transformations are called optimizations, and they entail some cost to programmers. Optimization necessarily increases compilation time. Additionally, optimized code is often harder to debug than non-optimized code, because the optimizer usually eliminates and reorders instructions. In short, you usually want to turn off optimizations while you’re programming, but turn them on during testing and in production. Accordingly, compilers typically provide several optimization options. Table 21-1 describes one such example—the optimization options available in GCC 8.3, although these flags are fairly ubiquitous across the major compilers.

Table 21-1: GCC 8.3 Optimization Options

Flag

Description

-O0 (default)

Reduces compilation time by turning off optimizations. Yields a good debugging experience but suboptimal runtime performance.

-O or -O1

Performs the majority of available optimizations, but omits those that can take a lot of (compile) time.

-O2

Performs all optimizations at -O1, plus nearly all optimizations that don’t substantially increase binary size. Compilation might take much longer than with -O1.

-O3

Performs all optimizations at -O2, plus many optimizations that can substantially increase binary size. Again, this increases compilation time over -O1 and -O2.

-Os

Optimizes similarly to -O2 but with a priority for decreasing binary size. You can think of this (loosely) as a foil to -O3, which is willing to increase binary size in exchange for performance. Any -O2 optimizations that don’t increase binary size are performed.

-Ofast

Enables all -O3 optimizations, plus some dangerous optimizations that might violate standards compliance. Caveat emptor.

-Og

Enables optimizations that don’t degrade the debugging experience. Provides a good balance of reasonable optimizations, fast compilation, and ease of debugging.

As a general rule, use -O2 for your production binary unless you have a good reason to change it. For debugging, use -Og.

Linking with C

You can allow C code to incorporate functions and variables from your programs using language linkage. Language linkage instructs the compiler to generate symbols with a specific format friendly to another target language. For example, to allow a C program to use your functions, you simply add the extern "C" language linkage to your code.

Consider the sum.h header in Listing 21-21, which generates a C-compatible symbol for sum.

 // sum.h
#pragma once
extern "C" int sum(const int* x, int len);

Listing 21-21: A header that makes the sum function available to C linkers

Now the compiler will generate objects that the C linker can use. To use this function within C code, you simply declare the sum function per usual:

int sum(const int* x, size_t len);

Then instruct your C linker to include the C++ object file.

NOTE

According to the C++ Standard, pragma is a method to provide additional information to the compiler beyond what is embedded in the source code. This information is implementation defined, so the compiler isn’t required to use the information specified by the pragma in any way. Pragma is the Greek root for “a fact.”

You can also interoperate the opposite way: use C compiler output within your C++ programs by giving the linker the C compiler-generated object file.

Suppose a C compiler generated a function equivalent to sum. You could compile using the sum.h header, and the linker would have no problem consuming the object file, thanks to language linkage.

If you have many externed functions, you can use braces {}, as Listing 21-22 illustrates.

// sum.h
#pragma once

extern "C" {
  int sum_int(const int* x, int len);
  double sum_double(const double* x, int len);
--snip--
}

Listing 21-22: A refactoring of Listing 21-21 containing multiple functions with the extern modifier.

The sum_int and sum_double functions will have C language linkage.

NOTE

You can also interoperate between C++ and Python with Boost Python. See the Boost documentation for details.

Summary

In this chapter, you first learned about program support features that allow you to interact with the application life cycle. Next, you explored Boost ProgramOptions, which allows you to accept input from users easily using a declarative syntax. Then you examined some selected topics in compilation that will be helpful as you expand your C++ application development horizons.

EXERCISES

21-1. Add graceful keyboard interrupt handling to the asynchronous upper-casing echo server in Listing 20-12. Add a kill switch with static storage duration that the session objects and acceptors check before queueing more asynchronous I/O.

21-2. Add program options to the asynchronous HTTP client in Listing 20-10. It should accept options for the host (like www.nostarch.com) and one or more resources (like /index.htm). It should create a separate request for each resource.

21-3. Add another option to your program in exercise 21-2 that accepts a directory where you’ll write all the HTTP responses. Derive a filename from each host/resource combination.

21-4. Implement the mgrep program. It should incorporate many of the libraries you’ve learned about in Part II. Investigate the Boyer-Moore search algorithm in Boost Algorithm (in the <boost/algorithm/searching/boyer_moore.hpp> header). Use std::async to launch tasks and determine a way to coordinate work between them.

FURTHER READING

  • The Boost C++ Libraries, 2nd Edition, by Boris Schäling (XML Press, 2014)
  • API Design for C++ by Martin Reddy (Morgan Kaufmann, 2011)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset