Hardin once said, “To succeed, planning alone is insufficient. One must improvise as well.” I’ll improvise.
—Isaac Asimov, Foundation
As discussed in Chapter 1, a type declares how an object will be interpreted and used by the compiler. Every object in a C++ program has a type. This chapter begins with a thorough discussion of fundamental types and then introduces user-defined types. Along the way, you’ll learn about several control flow structures.
Fundamental types are the most basic types of object and include integer, floating-point, character, Boolean, byte, size_t, and void. Some refer to fundamental types as primitive or built-in types because they’re part of the core language and almost always available to you. These types will work on any platform, but their features, such as size and memory layout, depend on implementation.
Fundamental types strike a balance. On one hand, they try to map a direct relationship from C++ construct to computer hardware; on the other hand, they simplify writing cross-platform code by allowing a programmer to write code once that works on many platforms. The sections that follow provide additional detail about these fundamental types.
Integer types store whole numbers: those that you can write without a fractional component. The four sizes of integer types are short int, int, long int, and long long int. Each can be either signed or unsigned. A signed variable can be positive, negative, or zero, and an unsigned variable must be non-negative.
Integer types are signed and int by default, which means you can use the following shorthand notations in your programs: short, long, and long long rather than short int, long int, and long long int. Table 2-1 lists all available C++ integer types, whether each is signed or unsigned, the size of each (in bytes) across platforms, as well as the format specifier for each.
Table 2-1: Integer Types, Sizes, and Format Specifiers
Type |
Signed |
Size in bytes |
printf format specifier |
|||
32-bit OS |
64-bit OS |
|||||
Windows |
Linux/Mac |
Windows |
Linux/Mac |
|||
Short |
Yes |
2 |
2 |
2 |
2 |
%hd |
unsigned short |
No |
2 |
2 |
2 |
2 |
%hu |
int |
Yes |
4 |
4 |
4 |
4 |
%d |
unsigned int |
No |
4 |
4 |
4 |
4 |
%u |
long |
Yes |
4 |
4 |
4 |
8 |
%ld |
unsigned long |
No |
4 |
4 |
4 |
8 |
%lu |
long long |
Yes |
8 |
8 |
8 |
8 |
%lld |
unsigned long long |
No |
8 |
8 |
8 |
8 |
%llu |
Notice that the integer type sizes vary across platforms: 64-bit Windows and Linux/Mac have different sizes for a long integer (4 and 8, respectively).
Usually, a compiler will warn you of a mismatch between format specifier and integer type. But you must ensure that the format specifiers are correct when you’re using them in printf statements. Format specifiers appear here so you can print integers to console in examples to follow.
Note
If you want to enforce guaranteed integer sizes, you can use integer types in the <cstdint> library. For example, if you need a signed integer with exactly 8, 16, 32, or 64 bits, you could use int8_t, int16_t, int32_t, or int64_t. You’ll find options for the fastest, smallest, maximum, signed, and unsigned integer types to meet your requirements. But because this header is not always available in every platform, you should only use cstdint types when there is no other alternative.
A literal is a hardcoded value in a program. You can use one of four hardcoded, integer literal representations:
binary Uses the prefix 0b
octal Uses the prefix 0
decimal This is the default
hexadecimal Uses the prefix 0x
These are four different ways of writing the same set of whole numbers. For example, Listing 2-1 shows how you might assign several integer variables with integer literals using each of the non-decimal representations.
#include <cstdio> int main() { unsigned short a = 0b10101010; ➊ printf("%hu ", a); int b = 0123; ➋ printf("%d ", b); unsigned long long d = 0xFFFFFFFFFFFFFFFF; ➌ printf("%llu ", d); } -------------------------------------------------------------------------- 170 ➊ 83 ➋ 18446744073709551615 ➌
Listing 2-1: A program that assigns several integer variables and prints them with the appropriate format specifier
This program uses each of the non-decimal integer representations (binary ➊, octal ➋, and hexadecimal ➌) and prints each with printf using the appropriate format specifier listed in Table 2-1. The output from each printf appears as a following comment.
NOTE
Integer literals can contain any number of single quotes (') for readability. These are completely ignored by the compiler. For example, 1000000 and 1'000'000 are both integer literals equal to one million.
Sometimes, it’s useful to print an unsigned integer in its hexadecimal representation or (rarely) its octal representation. You can use the printf specifiers %x and %o for these purposes, respectively, as shown in Listing 2-2.
#include <cstdio> int main() { unsigned int a = 3669732608; printf("Yabba %x➊! ", a); unsigned int b = 69; printf("There are %u➋,%o➌ leaves here. ", b➍, b➎); } -------------------------------------------------------------------------- Yabba dabbad00➊! There are 69➋,105➌ leaves here.
Listing 2-2: A program that uses octal and hexadecimal representations of unsigned integers
The hexadecimal representation of the decimal 3669732608 is dabbad00, which appears in the first line of output as a result of the hexadecimal format specifier %x ➊. The decimal 69 is 105 in octal. The format specifiers for unsigned integer %u ➋ and octal integer %o ➌ correspond with the arguments at ➍ and ➎, respectively. The printf statement substitutes these quantities ➋➌ into the format string, yielding the message There are 69,105 leaves in here.
Warning
The octal prefix is a holdover from the B language, back in the days of the PDP-8 computer and ubiquitous octal literals. C, and by extension C++, continues the dubious tradition. You must be careful, for example, when you’re hardcoding ZIP codes:
int mit_zip_code = 02139; // Won't compile
Eliminate leading zeros on decimal literals; otherwise, they’ll cease to be decimal. This line doesn’t compile because 9 is not an octal digit.
By default, an integer literal’s type is one of the following: int, long, or long long. An integer literal’s type is the smallest of these three types that fits. (This is defined by the language and will be enforced by the compiler.)
If you want more control, you can supply suffixes to an integer literal to specify its type (suffixes are case insensitive, so you can choose the style you like best):
You can combine the unsigned suffix with either the long or the long long suffix to specify signed-ness and size. Table 2-2 shows the possible types that a suffix combination can take. Allowed types are shown with a check mark (✓). For binary, octal, and hexadecimal literals, you can omit the u or U suffix. These are depicted with an asterisk (*).
Table 2-2: Integer Suffixes
Type | (none) | l/L | ll/LL | u/U | ul/UL | ull/ULL |
int |
✓ |
|||||
long |
✓ |
✓ |
||||
long long |
✓ |
✓ |
✓ |
|||
unsigned int |
* |
✓ |
||||
unsigned long |
* |
* |
✓ |
✓ |
||
unsigned long long |
* |
* |
* |
✓ |
✓ |
✓ |
The smallest allowed type that still fits the integer literal is the resulting type. This means that among all types allowed for a particular integer, the smallest type will apply. For example, the integer literal 112114 could be an int, a long, or a long long. Since an int can store 112114, the resulting integer literal is an int. If you really want, say, a long, you can instead specify 112114L (or 112114l).
Floating-point types store approximations of real numbers (which in our case can be defined as any number that has a decimal point and a fractional part, such as 0.33333 or 98.6). Although it’s not possible to represent an arbitrary real number exactly in computer memory, it’s possible to store an approximation. If this seems hard to believe, just think of a number like π, which has infinitely many digits. With finite computer memory, how could you possibly represent infinitely many digits?
As with all types, floating-point types take up a finite amount of memory, which is called the type’s precision. The more precision a floating-point type has, the more accurate it will be at approximating a real number. C++ offers three levels of precision for approximations:
float single precision
double double precision
long double extended precision
As with integer types, each floating-point representation depends on implementation. This section won’t go into detail about floating-point types, but note that there is substantial nuance involved in these implementations.
On major desktop operating systems, the float level usually has 4 bytes of precision. The double and long double levels usually have 8 bytes of precision (double precision).
Most users not involved in scientific computing applications can safely ignore the details of floating-point representation. In such cases, a good general rule is to use a double.
NOTE
For those who cannot safely ignore the details, look at the floating-point specification relevant to your hardware platform. The predominant implementation of floating-point storage and arithmetic is outlined in The IEEE Standard for Floating-Point Arithmetic, IEEE 754.
Floating-point literals are double precision by default. If you need single precision, use an f or F suffix; for extended precision, use l or L, as shown here:
float a = 0.1F; double b = 0.2; long double c = 0.3L;
You can also use scientific notation in literals:
double plancks_constant = 6.62607004➊e-34➋;
No spaces are permitted between the significand (the base ➊) and the suffix (the exponential portion ➋).
The format specifier %f displays a float with decimal digits, whereas %e displays the same number in scientific notation. You can let printf decide which of these two to use with the %g format specifier, which selects the more compact of %e or %f.
For double, you simply prepend an l (lowercase L) to the desired specifier; for long double, prepend an L. For example, if you wanted a double with decimal digits, you would specify %lf, %le, or %lg; for a long double, you would specify %Lf, %Le, or %Lg.
Consider Listing 2-3, which explores the different options for printing floating points.
#include <cstdio> int main() { double an = 6.0221409e23; ➊ printf("Avogadro's Number: %le➋ %lf➌ %lg➍ ", an, an, an); float hp = 9.75; ➎ printf("Hogwarts' Platform: %e %f %g ", hp, hp, hp); } -------------------------------------------------------------------------- Avogadro's Number: 6.022141e+23➋ 602214090000000006225920.000000➌ 6.02214e+23➍ Hogwarts' Platform: 9.750000e+00 9.750000 9.75
Listing 2-3: A program printing several floating points
This program declares a double called an ➊. The format specifier %le ➋ gives you scientific notation 6.022141e-23, and %lf ➌ gives the decimal representation 602214090000000006225920.000000. The %lg ➍ specifier chose the scientific notation 6.02214e-23. The float called hp ➎ produces similar printf output using the %e and %f specifiers. But the format specifier %g decided to provide the decimal representation 9.75 rather than scientific notation.
As a general rule, use %g to print floating-point types.
NOTE
In practice, you can omit the l prefix on the format specifiers for double, because printf promotes float arguments to double precision.
Character types store human language data. The six character types are:
char The default type, always 1 byte. May or may not be signed. (Example: ASCII.)
char16_t Used for 2-byte character sets. (Example: UTF-16.)
char32_t Used for 4-byte character sets. (Example: UTF-32.)
signed char Same as char but guaranteed to be signed.
unsigned char Same as char but guaranteed to be unsigned.
wchar_t Large enough to contain the largest character of the implementation’s locale. (Example: Unicode.)
The character types char, signed char, and unsigned char are called narrow characters, whereas char16_t, char32_t, and wchar_t are called wide characters due to their relative storage requirements.
A character literal is a single, constant character. Single quotation marks (' ') surround all characters. If the character is any type but char, you must also provide a prefix: L for wchar_t, u for char16_t, and U for char32_t. For example, 'J' declares a char literal and L'J' declares a wchar_t.
Some characters don’t display on the screen. Instead, they force the display to do things like move the cursor to the left side of the screen (carriage return) or move the cursor down one line (newline). Other characters can display onscreen, but they’re part of the C++ language syntax, such as single or double quotes, so you must use them very carefully. To put these characters into a char, you use the escape sequences, as listed in Table 2-3.
Table 2-3: Reserved Characters and Their Escape Sequences
Value | Escape sequence |
Newline |
|
Tab (horizontal) |
|
Tab (vertical) |
v |
Backspace |
|
Carriage return |
|
Form feed |
f |
Alert |
a |
Backslash |
\ |
Question mark |
? or ? |
Single quote |
' |
Double quote |
" |
The null character |