Appendix

C definition and keywords

The C specification has become quite large. You can read the full specification for each version at http://www.iso-9899.info/wiki/The_Standard.

C keywords

The following table provides a list of reserved keywords in C by category. These keywords cannot be redefined in your programs. Some of these have not been explained in this book:

Keys

1: Added to the C99 standard.

2: Added to the C11 standard. Many of these keywords facilitate quite advanced functions in computer programming.

Table of operators and their precedence

The following table lists the precedence and associativity of C operators. Operators are listed from top to bottom, in descending precedence. The grouping operator, (), has the highest precedence. The sequence operator, (,), has the lowest precedence. There are four classes of operators: postfix, prefix, unary, and binary:

Summary of useful GCC and Clang compiler options

The following is a list of the compiler switches already encountered, with the addition of other useful switches and why you might want to use them:

There is a dizzying array of options switches for the GCC compiler. These can be found on the GNU website at https://gcc.gnu.org/onlinedocs/gcc/Option-Summary.html.

ASCII character set

We have a table of 256 ASCII characters. The table is reproduced here for convenience; it was generated from the program we created in Chapter 15, Working with Strings:

The Better String Library (Bstrlib)

Here is the introduction to Bstrlib taken from its document file:

“The bstring library is an attempt to provide improved string processing functionality to the C and C++ language. At the heart of the bstring library (Bstrlib for short) is the management of “bstring”s which are a significant improvement over ‘’ terminated char buffers.”

The full documentation can be found at https://raw.githubusercontent.com/websnarf/bstrlib/master/bstrlib.txt. The documentation is thorough in providing motivation and seems to be complete in that it describes every function and its possible side effects, if any. If you decide to incorporate this library into your programs, I strongly suggest you read and study this document. In this brief introduction to Bstrlib, we will focus entirely on the C functions of the library, not the C++ functions.

The Bstrlib home page can be found at http://bstring.sourceforge.net/. The source can be found at https://github.com/websnarf/bstrlib.

A quick introduction to Bstrlib

Bstrlib is a set of programs that is meant to completely replace the C standard library string handling functions. It provides the following groups of functions:

  • Core C files (one source file and header)
  • Base Unicode support, if needed (two source files and headers)
  • Extra utility functions (one source file and header)
  • A unit/regression test for Bstrlib (one source file)
  • A set of dummy functions to abort the use of unsafe C string functions (one source file and header)

To get the core functionality of Bstrlib, a program only needs to include one header file, bstrlib.h, and one source file, bstrlib.c, for compilation, along with the other program source files.

Unlike C strings, which are arrays of ''-terminated characters, bstring is a structure, defined as follows:

struct tagbstring {
int mlen; // lower bound of memory allocated for data.
int slen; // actual length of string
unsigned char* data; // string
};

This structure is exposed so that its members can be accessed directly. It is far better to manipulate this structure through functions because all of the functions perform memory management so that we don’t need to (apart from allocating and freeing bstring).

There are functions to create a bstring structure from a C string, allocate bstrings that contain C strings and free them, copy and concatenate bstrings, compare and test the equality of bstrings and C strings, search for and extract substrings within a bstring, find and replace string functions, and create various bstring conversion functions. All of these are described in the documentation, which is impressive. Bstrlib also provides functions that return lists of strings and their own bstreams. This library offers quite a lot if you are willing to take the time to learn it.

Okay, without repeating the documentation, let’s take a look at some very simple examples.

A few simple examples

These examples are very, very simple and are meant to give you a feel for using Bstrlib. The examples provided on the SourceForge website are quite advanced string handling examples. They are extremely useful and well worth studying.

Our first bstrlib example will be the Hello, world! program, as follows:

#include <stdio.h>
#include "bstrlib.h"
int main( void ) {
bstring b = bfromcstr ("Hello, World!");
puts( (char*)b->data );
}

This program, bstr_hello.c, creates bstring from a C string and then prints it using puts(). To compile this program, be sure that the bstrlib.h and bstrlib.c files are in the same directory as this program. Then, enter the following command:

cc bstrlib.c bstr_hello.c -o bstr_hello -Wall -Werror -std=c18.

In our next example, we will split a string into multiple strings based on a delimiter and then print them. We can do this with the C standard library, but it is rather complicated to do (which is why we didn’t even try it earlier). With Bstrlib, it’s simple, as you can see in the following program:

#include <stdio.h>
Appendix
[ 8 ]
#include "bstrlib.h"
int main( void ) {
bstring b = bfromcstr( "Hello, World and my Grandma, too!" );
puts( (char*)b->data );
struct bstrList *blist = bsplit( b , ' ' );
printf( "num %d
" , blist->qty );
for( int i=0 ; i<blist->qty ; i++ ) {
printf( "%d: %s
" , i , bstr2cstr( blist->entry[i] , '_' ) );
}
}

This program, bstr_split.c, first creates bstring from a C string and prints it out. Then, it creates a bstrList variable by calling bsplit() with <space> as the delimiter in a single line. The last three statements print each element of the list. To compile this program, make sure the bstrlib.h and bstrlib.c files are in the same directory as this program. Then, enter the following command:

cc bstrlib.c bstr_split.c -o bstr_split -Wall -Werror -std=c18.

Recall how in Chapter 23, Using File Input and File Output, we needed to write a trimStr() function to clean up input from fgets(). That function was approximately 30 lines of code. In our last example, we will compare this function to Bstrlib. We’ll create a test program that uses seven different test strings and then trim them once with our function, renamed CTrimStr(), and a bstrlib version, named BTrimStr():

  1. First, we’ll set up main(), which repeatedly calls testTrim(), as follows:

    #include <stdio.h>

    #include <ctype.h>

    #include <string.h>

    #include "bstrlib.h"

    int CTrimStr( char* pCStr );

    int BTrimStr( bstring b );

    void testTrim( int testNum , char* pString );

    int main( void ) {

    testTrim( 1 , "Hello, World! " );

    testTrim( 2 , "Box of frogs " );

    testTrim( 3 , " Bag of hammers" );

    testTrim( 4 , " Sack of ferrets " );

    testTrim( 5 , " v " );

    testTrim( 6 , "" );

    testTrim( 7 , "Goodbye, World!" );

    }

This declares our testTrim () function prototypes, the test function, which calls the trim functions, and seven test cases, each consisting of a string with various forms for the trimming that is required.

  1. Next, we add our testTrim() function, which calls both CTrimStr() and BTrimStr(), as follows:

    void testTrim( int testNum , char* pInputString ) {

    size_t len;

    char testString[ strlen( pInputString ) + 1];

    strcpy( testString , pInputString );

    fprintf( stderr , "%1d. original: "%s" [len:%d] " ,

    testNum, testString , (int)strlen( pInputString ) );

    strcpy( testString , pInputString );

    len = CTrimStr( testString );

    fprintf( stderr , " CTrimStr: "%s" [len:%d] " ,

    testString , (int)len ) ;

    bstring b = bfromcstr( pInputString );

    len = BTrimStr( b );

    fprintf( stderr , " BTrimStr: "%s" [len:%d] " ,

    (char*)b->data , (int)len );

    }

This function consists of three parts. The first part copies the input string to a working string that the trim functions will manipulate, and then prints the original test string. The second part resets testString, calls CTrimStr(), and then prints the result. The third part creates a bstring from the input string, calls BTrimStr(), and prints the result.

  1. CTrimStr() is reproduced here for reference, as follows:

    int CTrimStr( char* pCStr ) {

    size_t first , last , lenIn , lenOut ;

    first = last = lenIn = lenOut = 0;

    lenIn = strlen( pCStr ); //

    char tmpStr[ lenIn+1 ]; // Create working copy.

    strcpy( tmpStr , pCStr ); //

    char* pTmp = tmpStr; // pTmp may change in Left Trim

    segment.

    // Left Trim

    // Find 1st non-whitespace char; pStr will point to that.

    while( isspace( pTmp[ first ] ) )

    Appendix

    [ 10 ]

    first++;

    pTmp += first;

    lenOut = strlen( pTmp ); // Get new length after Left Trim.

    if( lenOut ) { // Check for empty string.

    // e.g. " " trimmed to nothing.

    // Right Trim

    // Find 1st non-whitespace char & set NUL character there.

    last = lenOut-1; // off-by-1 adjustment.

    while( isspace( pTmp[ last ] ) )

    last--;

    pTmp[ last+1 ] = ''; // Terminate trimmed string.

    }

    lenOut = strlen( pTmp ); // Length of trimmed string.

    if( lenIn != lenOut ) // Did we change anything?

    strcpy( pCStr , pTmp ); // Yes, copy trimmed string back.

    return lenOut;

    }

This function was explained in Chapter 23, Using File Input and File Output, and will not be repeated here.

  1. The bstring test trim function is as follows:

    int BTrimStr( bstring b ) {

    btrimws( b );

    return b->slen;

    }

It takes the given bstring, trims it with a call to btrimws(), and then returns the length of the new string. We really didn’t need to write this function at all; we only did so to compare it to our own CTrimStr() function.

  1. To compile this program, make sure the bstrlib.h and bstrlib.c files are in the same directory as this program. Then, enter the following command:

    cc bstrlib.c bstr_trim.c -o bstr_split -Wall -Werror -std=c18.

You can find these example source files in the source code repository. C strings are very simple, but the C string library functions are rather complex and have a number of issues that all programmers must pay very close attention to. Bstrings are a little more complicated to initialize, but the library itself provides a very rich set of string handling, string list handling, and bstream functionality.

Unicode and UTF-8

This is a very deep and broad topic. The purpose of this section is to provide a cursory introduction to the topic, as well as to provide some resources to learn much more about this topic.

A brief history

In the early days of computers, there was 7-bit ASCII, but that wasn’t good enough for everyone, so someone came up with 16-bit Unicode. This was a good start, but it has its own problems. Finally, the guys who invented C got around to inventing UTF-8, which is backward-compatible with ASCII and dovetails into UTF-16 and UTF-32, so anyone around the world can write Hello, World! in their own language using their own characters on just about any computer. An added benefit of UTF-8 is that it is easily converted into/from Unicode when needed. Unicode didn’t stop there; it evolved as well. Unicode and UTF-8 are different encodings, but they are still somewhat interrelated.

Where we are today

Unicode now replaces older character encodings, such as ASCII, ISO 8859, and EUC, at all levels. Unicode enables users to handle practically any script or language used on this planet. It also supports a comprehensive set of mathematical and technical symbols to simplify scientific information exchange.

UTF-8 encoding is defined in ISO 10646-1:2000 Annex D and in RFC 3629 (http://www.ietf.org/rfc/rfc3629.txt), as well as Section 3.9 of the Unicode 4.0 standard. It does not have the compatibility problems of Unicode and earlier wide-character encodings. With UTF-8 encoding, Unicode can be used in a convenient and backward-compatible way in environments that were designed entirely around ASCII, such as Unix. UTF-8 is the way in which Unicode is used under Unix, Linux, macOS, and similar systems. It is clearly the way to go for using Unicode under Unix-style operating systems.

Moving from ASCII to UTF-8

There are two approaches to adding UTF-8 support to any ASCII program. One is called soft conversion and the other is called hard conversion. In soft conversion, data is kept in its UTF-8 form everywhere and very few software changes are necessary. In hard conversion, any UTF-8 data that the program reads will be converted into wide-character arrays and handled as such everywhere within the application. Strings will only be converted back into UTF-8 form at output time. Internally, a character remains a fixed-size memory object.

Most applications can do very well with just soft conversion. This is what makes the introduction of UTF-8 on Unix feasible at all. The C standard library headers to address wide characters and Unicode are wchar.h, wctype.h, and uchar.h.

A UTF to Unicode example

To give you an idea of what it is like to convert between Unicode and UTF-8, consider the following program:

#include <stdio.h>
#include <locale.h>
#include <stdlib.h>
#include <stdio.h>
int main(void) {
wchar_t ucs2[5] = {0};
if( !setlocale( LC_ALL , "en_AU.UTF-8" ) ) {
printf( "Unable to set locale to Australian English in UTF-8
" );
exit( 1 );
}
// The UTF-8 representation of string "æ°´è°ƒæ*OE头"
// (four Chinese characters pronounced shui3 diao4 ge1 tou2) */
char utf8[] = "xE6xB0xB4xE8xB0x83xE6xADx8CxE5xA4xB4" ;
mbstowcs( ucs2 , utf8 , sizeof(ucs2) / sizeof(*ucs2) );
printf( " UTF-8: " );
for( char *p = utf8 ; *p ; p++ )
printf( "%02X ", (unsigned)(unsigned char)*p );
printf( "
" );
printf( "Unicode: " );
for( wchar_t *p = ucs2 ; *p ; p++ )
printf( "U+%04lX ", (unsigned long) *p );
printf( "
" );
}

The main work of this program is the call to mbstowcs(), which converts from UTF-8 to Unicode and is here represented as a 16-bit wchar_t variable.

For further reading, go to https://home.unicode.org to find more resources for Unicode and UTF-8.

The C standard library

The C standard library offers quite a bit of functionality. The first thing to be aware of when using any part of this library is what’s in it. The following tables provide the header filenames and descriptions of the functions prototyped in each header file.

The following table shows the library files before C99:

The following table shows which files have been added to C99:

The following table shows which files have been added to C11:

If you have been compiling programs throughout this book, these files will already exist on your system. You need to find out where they are so that you can open them with an editor and examine exactly what is in them.

Method 1

In a terminal/console with a Unix shell (such as csh, tsh, bash, and so on), do the following:

  1. Create a simple program – for example, hello.c.
  2. Add the header file you want to find and save it.
  3. In a bash command shell, execute the following:

    cc -H hello.c

Ouch! Way too much information. What you are seeing is the full #include stack of every single header file that is included in each header file. As you can see, some are included a lot of times.

You can also see that a lot of header files include other header files.

Method 2

In a terminal/console with a Unix shell (such as csh, tsh, bash, and so on), do the following:

  1. Create a simple program – for example, hello.c.
  2. Add the header file you want to find, and save it.
  3. In a bash command shell, execute the following:

    cc -H hello.c 2>&1 | grep '^. '

This command, which looks like a lot of gobbledegook, is doing the following:

  1. It invokes the compiler with the -H option. The list of header files is sent to stderr.
  2. 2>&1 redirects stderr to stdout.
  3. stdout is then redirected via a pipe (|) to grep, a regular expression parser.
  4. grep is told to search the beginning of each line for <period><space>:
    • '…' is the search string.
    • ^ indicates the beginning of a line.
    • . is a period (this is important, as a dot (.) alone has special meaning in grep).
    • is a space (this is important, as a space alone has special meaning in grep).
  5. You will now only see one or two header files without all of the #include stacks.

Method 3

This one is the simplest of all if you have the locate program on your system.

In your terminal/console, enter the following command:

locate <filename.h>

You might also get a lot of output from this, since your system might have many versions of these header files.

Method 2 is best because it tells you exactly which header file the compiler is using. Once you have found the function you want to know more about in one of these files, you can then use the Unix man command to read about it on your system. To do so, enter the following into a terminal/console:

man 3 <function>

This tells man to look in section 3 for the given function. Section 3 is where C functions are described.

Alternatively, you could try the following:

man 7 <topic>

Section 7 is where general topics are described. There is a lot of information there.

Note

If you are new to man, try entering man man and it will tell you about itself.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset