Chapter 25

Developing Cross-Platform and Cross-Language Applications

WHAT’S IN THIS CHAPTER?

  • How to write code that runs on multiple platforms
  • How to mix different programming languages together

C++ programs can be compiled to run on a variety of computing platforms and the language has been rigorously defined to ensure that programming in C++ for one platform is very similar to programming in C++ for another. Yet, despite the standardization of the language, platform differences eventually come into play when writing professional-quality programs in C++. Even when development is limited to a particular platform, small differences in compilers can elicit major programming headaches. This chapter examines the necessary complication of programming in a world with multiple platforms and multiple programming languages.

The first part of this chapter surveys the platform-related issues that C++ programmers encounter. A platform is the collection of all of the details that make up your development and/or run-time system. For example, your platform may be the Microsoft Visual C++ 2010 compiler running on Windows 7 on an Intel Core i7 processor. Alternatively, your platform might be the GCC 4.6 compiler running on Linux on a PowerPC processor. Both of these platforms are able to compile and run C++ programs, but there are significant differences between them.

The second part of this chapter looks at how C++ can interact with other programming languages. While C++ is a general-purpose language, it may not always be the right tool for the job. Through a variety of mechanisms, you can integrate C++ with other languages that may better serve your needs.

CROSS-PLATFORM DEVELOPMENT

There are several reasons why the C++ language encounters platform issues. Even though C++ is a high-level language, its definition includes low-level implementation details. For example, C++ arrays are defined to live in contiguous blocks of memory. Such a specific implementation detail exposes the language to the possibility that not all systems arrange and manage their memory in the same way. C++ also faces the challenge of providing a standard language and a standard library without a standard implementation. Varying interpretations of the specification among C++ compiler and library vendors can lead to trouble when moving from one system to another. Finally, C++ is selective in what the language provides as standard. Despite the presence of a standard library, sophisticated programs often need functionality that is not provided by the language or the standard library. This functionality generally comes from third-party libraries or the platform, and can vary greatly.

Architecture Issues

The term architecture generally refers to the processor, or family of processors, on which a program runs. A standard PC running Windows or Linux generally runs on the x86 architecture, and older versions of Mac OS were usually found on the PowerPC architecture. As a high-level language, C++ shields you from the differences between these architectures. For example, a Pentium processor may have a single instruction that performs the same functionality as six PowerPC instructions. As a C++ programmer, you don’t need to know what this difference is or even that it exists. One advantage to using a high-level language is that the compiler takes care of converting your code into the processor’s native assembly code format.

Processor differences do, however, rise up to the level of C++ code at times. You won’t face most of these issues unless you are doing particularly low-level work, but you should be aware that they exist.

Binary Compatibility

As you probably already know, you cannot take a program written and compiled for a Pentium computer and run it on a PowerPC-based Mac. These two platforms are not binary compatible because their processors do not support the same set of instructions. When you compile a C++ program, your source code is turned into binary instructions that the computer executes. That binary format is defined by the platform, not by the C++ language.

One solution to support platforms that are not binary compatible is to build each version separately with a compiler on each target platform.

Another solution is cross-compiling. When you are using platform X for your development, but you want your program to run on platforms Y and Z, you can use a cross-compiler on your platform X that generates binary code for platform Y and Z.

You can also make your program open source. By making your source available to the end user, she can compile it natively on her system and build a version of the program that is in the correct binary format for her machine. As discussed in Chapter 2, open-source software has become increasingly popular. One of the major reasons is that it allows programmers to collaboratively develop software and increase the number of platforms on which it can run.

Address Sizes

When someone describes an architecture as 32-bit, they most likely mean that the address size is 32 bits, or 4 bytes. In general, a system with a larger address size can handle more memory and might operate more quickly on complex programs.

Since pointers are memory addresses, they are inherently tied to address sizes. Many programmers are taught that pointers are always 4 bytes, but this is wrong. For example, consider the following program, which outputs the size of a pointer:

image
int *ptr;
cout << "ptr size is " << sizeof(ptr) << " bytes" << endl;

Code snippet from PtrSizePtrSize.cpp

If this program is compiled and run on a 32-bit x86 system, such as the Pentium architecture, the output will be:

ptr size is 4 bytes

If you compile it with a 64-bit compiler and run it on a 64-bit x86 system, like the Intel Core i7, the output will be:

ptr size is 8 bytes

From a programmer’s point of view, the upshot of varying pointer sizes is that you cannot equate a pointer with 4 bytes. More generally, you need to be aware that most sizes are not prescribed by the C++ standard. The standard only says that a short integer has as much, or less, space as an integer, which has as much, or less, space as a long integer.

The size of a pointer is also not necessarily the same as the size of an integer. For example, on a 64-bit platform, pointers will be 64 bit, but integers could be 32 bit. Casting a 64-bit pointer to a 32-bit integer will result in losing 32 critical bits!

cross.gif

Never assume that a pointer is 32 bits or 4 bytes, and never cast a pointer to an integer.

Byte Order

All modern computers store numbers in a binary representation, but the representation of the same number on two platforms may not be identical. This sounds contradictory, but as you’ll see, there are two approaches to reading numbers that both make sense.

A single slot in your computer’s memory is usually a byte because most computers are byte addressable. Number types in C++ are usually multiple bytes. For example, a short may be 2 bytes. Imagine that your program contains the following line:

short myShort = 513;

In binary, the number 513 is 0000 0010 0000 0001. This number contains 16 ones and zeros, or 16 bits. Because there are 8 bits in a byte, the computer would need 2 bytes to store the number. Because each individual memory address contains 1 byte, the computer needs to split the number up into multiple bytes. Assuming that a short is 2 bytes, the number will get split into two even parts. The higher part of the number is put into the high-order byte and the lower part of the number is put into the low-order byte. In this case, the high-order byte is 0000 0010 and the low-order byte is 0000 0001.

Now that the number has been split up into memory-sized parts, the only question that remains is how to store them in memory. Two bytes are needed, but the order of the bytes is unclear and in fact depends on the architecture of the system in question.

One way to represent the number is to put the high-order byte first in memory and the low-order byte next. This strategy is called big-endian ordering because the bigger part of the number comes first. PowerPC and Sparc processors use a big-endian approach. Some other processors, such as x86, order the bytes in the opposite order, putting the low-order byte first in memory. This approach is called little-endian ordering because the smaller part of the number comes first. An architecture may choose one approach or the other, usually based on backward compatibility. For the curious, the terms “big-endian” and “little-endian” predate modern computers by several hundred years. Jonathan Swift coined the terms in his eighteenth-century novel Gulliver’s Travels to describe the opposing camps of a debate about the proper end on which to break an egg.

Regardless of the ordering a particular architecture uses, your programs can continue to use numerical values without paying any attention to whether the machine uses big-endian ordering or little-endian ordering. The ordering only comes into play when data moves between architectures. For example, if you are sending binary data across a network, you may need to consider the ordering of the other system. A solution is to use the standard Network Byte Ordering, which is always big-endian. So, before sending data across a network, convert it to big-endian, and whenever you receive data from a network, convert it from big-endian to the byte ordering of your system.

Similarly, if you are writing binary data to a file, you may need to consider what will happen when that file is opened on a system with opposite byte ordering.

Implementation Issues

When a C++ compiler is written, it is designed by a human being who attempts to adhere to the C++ standard. Unfortunately, the C++ standard is more than a thousand pages long and written in a combination of prose, language grammars, and examples. Two human beings implementing a compiler according to such a standard are unlikely to interpret every piece of prescribed information in the exact same way or to catch every single edge case. As a result, compilers will have bugs.

Compiler Quirks and Extensions

There is no simple rule for finding or avoiding compiler bugs. The best you can do is stay up-to-date on compiler updates and perhaps subscribe to a mailing list or newsgroup for your compiler. If you suspect that you have encountered a compiler bug, a simple web search for the error message or condition you have witnessed could uncover a workaround or patch.

One area that compilers are notorious for having trouble with is language additions that were not in the initial standard. For example, some of the template and run-time type features in C++ weren’t originally part of the language, and as a result, some compilers still don’t properly support these features. You will also encounter the same issues with the new features in C++11. Not all compilers support every single new feature yet.

Another issue to be aware of is that compilers often include their own language extensions without making it obvious to the programmer. For example, variable-sized stack-based arrays are not part of the C++ language, yet the following compiles and runs as expected with the g++ compiler:

image
int i = 4;
char myStackArray[i];  // Not a standard language feature!

Code snippet from VariableArrayVariableArray.cpp

Some compiler extensions may be useful, but if there is a chance that you will switch compilers at some point, you should see if your compiler has a strict mode where it will avoid such extensions. For example, compiling the previous code with the -pedantic flag passed to g++ will yield the following warning:

warning: ISO C++ forbids variable length array 'myStackArray' [-Wvla]

The C++ specification allows for a certain type of compiler-defined language extension through the #pragma mechanism. #pragma is a precompiler directive whose behavior is defined by the implementation. If the implementation does not understand the directive, it ignores it. For example, some compilers allow the programmer to turn compiler warnings off temporarily with #pragma.

Library Implementations

Most likely, your compiler includes an implementation of the C++ Standard Library, including the Standard Template Library. Since the STL is written in C++, however, you aren’t required to use the one that came bundled with your compiler. You could use a third-party STL that, for example, has been optimized for speed, or you could even write your own.

Of course, STL implementers face the same problem that compiler writers face — the standard is subject to interpretation. In addition, certain implementations may make tradeoffs that are incompatible with your needs. For example, one implementation may optimize for speed, while another implementation may focus on using as little memory as possible for containers.

When working with an STL implementation, or indeed any third-party library, it is important to consider the tradeoffs that the designer made during development. Chapter 2 contains a more detailed discussion of the issues involved in using libraries.

Platform-Specific Features

C++ is a great general-purpose language. With the addition of the Standard Library, the language is packed full of so many features that a casual programmer could happily code in C++ for years without going beyond what is built in. However, professional programs require facilities that C++ does not provide. This section lists several important features that are provided by the platform, not by the C++ language.

  • Graphical user interfaces: Most commercial programs today run on an operating system that has a graphical user interface, containing such elements as clickable buttons, movable windows, and hierarchical menus. C++, like the C language, has no notion of these elements. To write a graphical application in C++, you need to use platform-specific libraries that allow you to draw windows, accept input through the mouse, and perform other graphical tasks.
  • Networking: The Internet has changed the way we write applications. These days, most applications check for updates through the web, and games provide a networked multiplayer mode. C++ does not provide a mechanism for networking, though several standard libraries exist. The most common means of writing networking software is through an abstraction called sockets. A socket library implementation can be found on most platforms and it provides a simple procedure-oriented way to transfer data over a network. Some platforms support a streams-based networking system that operates like I/O streams in C++. Since IPv4 is running out of IP addresses, its successor, IPv6, will soon take over. Therefore, choosing a networking library that is IPv-independent would be a better choice than choosing one that only supports IPv4.
  • OS Events and application interaction: In pure C++ code, there is little interaction with the surrounding operating system and other applications. The command-line arguments are about all you get in a standard C++ program without platform extensions. For example, operations such as copy and paste are not directly supported in C++ and require platform-provided libraries.
  • Low-level files: Chapter 15 explains standard I/O in C++, including reading and writing files. Many operating systems provide their own file APIs, which are sometimes incompatible with the standard file classes in C++. These libraries often provide OS-specific file tools, such as a mechanism to get the home directory of the current user.
  • Threads: Concurrent threads of execution within a single program are not directly supported in C++03 or earlier. C++11 does include a threading library, explained in Chapter 22. If your compiler does not yet support the C++11 threading library, you need to use a third-party library. The most commonly used third-party thread library is called pthreads. Many operating systems and object-oriented frameworks also provide their own threading models.

CROSS-LANGUAGE DEVELOPMENT

For certain types of programs, C++ may not be the best tool for the job. For example, if your Unix program needs to interact closely with the shell environment, you may be better off writing a shell script than a C++ program. If your program performs heavy text processing, you may decide that the Perl language is the way to go. Sometimes what you want is a language that blends the general features of C++ with the specialized features of another language. Fortunately, there are some techniques you can use to get the best of both worlds — the flexibility of C++ combined with the unique specialty of another language.

Mixing C and C++

As you already know, the C++ language is a superset of the C language. All C programs will compile and run in C++ with a few minor exceptions. These exceptions usually have to do with reserved words. In C, for example, the term class has no particular meaning. Thus, it could be used as a variable name, as in the following C code:

image
int class = 1; // Compiles in C, not C++
printf("class is %d
", class);

Code snippet from MixingCMixingC.cpp

This program will compile and run in C, but will yield an error when compiled as C++ code. When you translate, or port, a program from C to C++, these are the types of errors you will face. Fortunately, the fixes are usually quite simple. In this case, rename the class variable to classID and the code will compile.

The ease of incorporating C code in a C++ program comes in handy when you encounter a useful library or legacy code that was written in C. Functions and classes, as you’ve seen many times in this book, work just fine together. A class method can call a function, and a function can make use of objects.

Shifting Paradigms

One of the dangers of mixing C and C++ is that your program may start to lose its object-oriented properties. For example, if your object-oriented web browser is implemented with a procedural networking library, the program will be mixing these two paradigms. Given the importance and quantity of networking tasks in such an application, you might consider writing an object-oriented wrapper around the procedural library.

For example, imagine that you are writing a web browser in C++, but you are using a C networking library that contains the functions declared in the following code. Note that the HostRecord and Connection data structures have been omitted for brevity.

// netwrklib.h
#include "hostrecord.h"
#include "connection.h"
// Gets the host record for a particular Internet host given
// its hostname (i.e. www.host.com)
HostRecord* lookupHostByName(char* inHostName);
// Connects to the given host
Connection* connectToHost(HostRecord* inHost);
// Retrieves a web page from an already-opened connection
char* retrieveWebPage(Connection* inConnection, char* page);

The netwrklib.h interface is fairly simple and straightforward. However, it is not object-oriented, and a C++ programmer who uses such a library is bound to feel icky, to use a technical term. This library isn’t organized into a cohesive class and it isn’t even const-correct. Of course, a talented C programmer could have written a better interface, but as the user of a library, you have to accept what you are given. Writing a wrapper is your opportunity to customize the interface.

Before we build an object-oriented wrapper for this library, take a look at how it might be used as is to gain an understanding of actual usage. In the following program, the netwrklib library is used to retrieve the web page at www.wrox.com/index.html:

#include <iostream>
#include "netwrklib.h"
using namespace std;
int main()
{
    HostRecord* myHostRecord = lookupHostByName("www.wrox.com");
    Connection* myConnection = connectToHost(myHostRecord);
    char* result = retrieveWebPage(myConnection, "/index.html");
    cout << "The result is " << result << endl;
    return 0;
}

A possible way to make the library more object-oriented is to provide a single abstraction that recognizes the links between looking up a host, connecting to the host, and retrieving a web page. A good object-oriented wrapper could hide the unnecessarily complexity of the HostRecord and Connection types.

This example follows the design principles described in Chapters 3 and 4: The new class should capture the common use case for the library. The previous example shows the most frequently used pattern — first a host is looked up, then a connection is established, then a page is retrieved. It is also likely that subsequent pages will be retrieved from the same host so a good design will accommodate that mode of use as well.

Following is the public portion of the definition for the WebHost class. This class makes the common case easy for the client programmer:

// WebHost.h
class WebHost
{
    public:
        // Constructs a WebHost object for the given host
        WebHost(const string& inHost);
        // Obtains the given page from this host
        string getPage(const string& inPage);
};

Consider the way a client programmer would use this class:

#include <iostream>
#include "WebHost.h"
int main()
{
    WebHost myHost("www.wrox.com");
    string result = myHost.getPage("/index.html");
    cout << "The result is " << result << endl;
    return 0;
}

The WebHost class effectively encapsulates the behavior of a host and provides useful functionality without unnecessary calls and data structures. The class even provides a useful new piece of functionality — once a WebHost is created, it can be used to obtain multiple web pages, saving code and possibly making the program run faster.

The implementation of the WebHost class makes extensive use of the netwrklib library without exposing any of its workings to the user. To enable this abstraction, the class needs a data member:

// WebHost.h
#include "netwrklib.h"
class WebHost
{
    // Omitted for brevity
    protected:
        Connection* mConnection;
};

The corresponding source file puts a new face on the functionality contained in the netwrklib library. First, the constructor builds a HostRecord for the specified host. Because the WebHost class deals with C++ strings instead of C-style strings, it uses the c_str() method on inHost to obtain a const char*, then performs a const cast to make up for netwrklib’s const-incorrectness. The resulting HostRecord is used to create a Connection, which is stored in the mConnection data member for later use:

WebHost::WebHost(const string& inHost)
{
    const char* host = inHost.c_str();
    HostRecord* theHost = lookupHostByName(const_cast<char*>(host));
    mConnection = connectToHost(theHost);
}

Subsequent calls to getPage() pass the stored connection to the netwrklib’s retrieveWebPage() function and return the value as a C++ string:

string getPage(const string& inPage)
{
    const char* page = inPage.c_str();
    string result = retrieveWebPage(mConnection, const_cast<char*>(page));
    return result;
} 
pen.gif

Networking-savvy readers may note that keeping a connection open to a host indefinitely is considered bad practice and doesn’t adhere to the HTTP specification. We’ve chosen elegance over etiquette in this example.

As you can see, the WebHost class provides an object-oriented wrapper around the C library. By providing an abstraction, you can change the underlying implementation without affecting client code, and you can provide additional features. These features can include connection reference counting, parsing of pages, or automatically closing connections after a specific time to adhere to the HTTP specification and automatically reopening the connection on the next getPage() call.

Linking with C Code

In the previous example, we assumed that you had the raw C code to work with. The example took advantage of the fact that most C code will successfully compile with a C++ compiler. If you have only compiled C code, perhaps in the form of a library, you can still use it in your C++ program, but you need to take a few extra steps.

In order to implement function overloading, the complex C++ namespace is “flattened.” For example, if you have a C++ program, it is legitimate to write:

void MyFunc(double);
void MyFunc(int);
void MyFunc(int, int);

However, this would mean that the linker would see several different names, all called MyFunc, and would not know which one you want to call. Therefore, all C++ compilers perform an operation which is referred to as name mangling and is the logical equivalent of generating names as follows:

MyFunc_double
MyFunc_int
MyFunc_int_int

To avoid conflicts with other names you might have defined, the generated names usually have some characters which are legal to the linker but not legal in C++ source code. For example, Microsoft VC++ generates names as follows:

?MyFunc@@YAXN@Z
?MyFunc@@YAXH@Z
?MyFunc@@YAXHH@Z

This encoding is complex and often vendor-specific. The C++ standard does not specify how function overloading should be implemented on a given platform, so there is no standard for name mangling algorithms.

In C, function overloading is not supported (the compiler will complain about duplicate definitions). So, names generated by the C compiler are quite simple; for example _MyFunc.

Now, if you compile a simple program with the C++ compiler, even if it has only one instance of the MyFunc name, it will still generate a request to link to a mangled name. But, when you link with the C library, it cannot find the desired mangled name, and the linker will complain. Therefore, it is necessary to tell the C++ compiler to not mangle that name. This is done by using the extern "language" qualification in both the header file (to instruct the client code to create a name compatible with the specified language) and, if your library source is in C++, at the definition site (to instruct the library code to generate a name compatible with the specified language).

The syntax of extern "language" is as follows:

extern "language" declaration1();
extern "language" declaration2();

Or:

extern "language" {
    declaration1();
    declaration2();
}

The C++ standard says that any language specification can be used, so in principle the following could be supported by a compiler:

extern "C" MyFunc(int i);
extern "FORTRAN" MatrixInvert(Matrix* M);
extern "Pascal" SomeLegacySubroutine(int n);
extern "Ada" AimMissileDefense(double angle);

In practice, many compilers only support "C". Each compiler vendor will inform you which language designators they support.

For example, in the following code, the function prototype for doCFunction() is specified as an external C function:

extern "C" {
    void doCFunction(int i);
}
int main()
{
    doCFunction(8); // Call the C function.
    return 0;
}

The actual definition for doCFunction() is provided in a compiled binary file attached in the link phase. The extern keyword informs the compiler that the linked-in code was compiled in C.

A more common pattern for using extern is at the header level. For example, if you are using a graphics library written in C, it probably came with a .h file for you to use. You can write another header file that wraps the original one in an extern block to specify that the entire header defines functions written in C. The wrapper .h file is often named with .hpp to distinguish it from the C version of the header:

// graphicslib.hpp
extern "C" {
    #include "graphicslib.h"
}

Another common model is to write a single header file, which is conditioned on whether it is being compiled for C or C++. A C++ compiler predefines the symbol __cplusplus if you are compiling for C++. The symbol is not defined for C compilations. So you will often see header files in the following form:

#ifdef __cplusplus
    extern "C" {
#endif
        declaration1();
        declaration2();
#ifdef __cplusplus
    } // matches extern "C"
#endif

This means that declaration1() and declaration2() are functions that are in a library compiled by the C compiler. Using this technique, the same header file can be used in both C and C++ clients.

Whether you are including C code in your C++ program or linking against a compiled C library, remember that even though C++ is essentially a superset of C, they are different languages with different design goals. Adapting C code to work in C++ is quite common, but providing an object-oriented C++ wrapper around procedural C code is often much better.

Mixing C# with C++

Even though this is a C++ book, we won’t pretend that there aren’t newer and snazzier languages out there. One example is C#. By using the Interop services from C#, it’s pretty easy to call C++ code from within your C# applications. An example scenario could be that you develop parts of your application, like the graphical user interface, in C#, but use C++ to implement certain performance-critical components. To make Interop work, you need to write a library in C++, which will be called from C#. On Windows, the library will be in a .DLL file. The following C++ example defines a FunctionInDLL() function that will be compiled into a library. The function accepts a Unicode string and returns an integer. The implementation writes the received string to the console and returns the value 42 to the caller:

image
#include <iostream>
using namespace std;
extern "C"
{
    __declspec(dllexport) int FunctionInDLL(const wchar_t* p)
    {
        wcout << L"The following string was received by C++:
    '";
        wcout << p << L"'" << endl;
        return 42;    // Return some value...
    }
}

Code snippet from CSharpHelloCpp.cpp

Keep in mind that you are implementing a function in a library, not writing a program; so, you will not need a main() function. Compiling this code depends on your environment. If you are using Microsoft Visual C++, you need to go to the properties of your project and select “Dynamic Library (.dll)” as the configuration type. Note that the example uses __declspec(dllexport) to tell the linker that this function should be made available to clients of the library. How to do this depends on your compiler. __declspec(dllexport) is the way you do this with Microsoft Visual C++.

Once you have the library, you can call it from C# by using Interop services. First, you need to include the Interop namespace:

using System.Runtime.InteropServices;

Next, you define the function prototype, and tell C# where it can find the implementation of the function. This is done with the following line, assuming you have compiled the library as HelloCpp.dll:

[DllImport("HelloCpp.dll", CharSet = CharSet.Unicode)]
public static extern int FunctionInDLL(String s);

The first part of this line is saying that C# should import this function from a library called HelloCpp.dll, and that it should use Unicode strings. The second part specifies the actual prototype of the function, which is a function accepting a string as parameter and returning an integer. The following code shows a complete example on how to use the C++ library from C#:

image
using System;
using System.Runtime.InteropServices;
namespace HelloCSharp
{
    class Program
    {
        [DllImport("HelloCpp.dll", CharSet = CharSet.Unicode)]
        public static extern int FunctionInDLL(String s);
        static void Main(string[] args)
        {
            Console.WriteLine("Writen by C#.");
            int res = FunctionInDLL("Some string from C#.");
            Console.WriteLine("C++ returned the value " + res);
        }
    }
}

Code snippet from CSharpHelloCSharp.cs

The output will be as follows:

Writen by C#.
The following string was received by C++:
    'Some string from C#.'
C++ returned the value 42

The details of the C# code are outside the scope of this C++ book, but the general idea should be clear with this example.

Mixing Java and C++ with JNI

The Java Native Interface, or JNI, is a part of the Java language that allows the programmer to access functionality that was not written in Java. Because Java is a cross-platform language, the original intent was to make it possible for Java programs to interact with the operating system. JNI also allows programmers to make use of libraries written in other languages, such as C++. Access to C++ libraries may be useful to a Java programmer who has a performance-critical piece of his application, or who needs to use legacy code.

JNI can also be used to execute Java code within a C++ program, but such a use is far less common. Because this is a C++ book, we do not include an introduction to the Java language. This section is targeted at readers who already know Java and wish to incorporate C++ code into their Java code.

To begin your Java cross-language adventure, start with the Java program. For this example, the simplest of Java programs will suffice:

image
public class HelloCpp {
    public static void main(String[] args)
    {
        System.out.println("Hello from Java!");
    }
}

Code snippet from JNIHelloCpp.java

Next, you need to declare a Java method that will be written in another language. To do this, you use the native keyword and leave out the implementation:

image
public class HelloCpp {
    // This will be implemented in C++.
    public native void callCpp();
    // Remainder omitted for brevity 
}

Code snippet from JNIHelloCpp.java

C++ code will eventually be compiled into a shared library that gets dynamically loaded into the Java program. You need to load this library inside a Java static block so that it is loaded when the Java program begins executing. The name of the library can be whatever you want, for example hellocpp.so on Unix systems, or hellocpp.dll on Windows systems.

image
public class HelloCpp {
    static {
        System.loadLibrary("hellocpp");
    }
    // Remainder omitted for brevity
}

Code snippet from JNIHelloCpp.java

Finally, you need to actually call the C++ code from within the Java program. The callCpp() Java method serves as a placeholder for the not-yet-written C++ code. Because callCpp() is a method of the HelloCpp class, you need to create a new HelloCpp object and call the callCpp() method on it:

image
public class HelloCpp {
    static {
        System.loadLibrary("hellocpp");
    }
    // This will be implemented in C++.
    public native void callCpp();
    public static void main(String[] args)
    {
        System.out.println("Hello from Java!");
        HelloCpp cppInterface = new HelloCpp();
        cppInterface.callCpp();
    }
}

Code snippet from JNIHelloCpp.java

That’s all for the Java side. Now, just compile the Java program as you normally would:

javac HelloCpp.java

Then use the javah program (the authors like to pronounce it jav-AHH!) to create a header file for the native method:

javah HelloCpp

After running javah, you will find a file named HelloCpp.h, which is a fully working (if somewhat ugly) C/C++ header file. Inside of that header file is a C function definition for a function called Java_HelloCpp_callCpp(). Your C++ program will need to implement this function. The full prototype is:

JNIEXPORT void JNICALL Java_HelloCpp_callCpp(JNIEnv* env, jobject javaobj);

Your C++ implementation of this function can make full use of the C++ language. This example outputs some text from C++. First, you need to include the jni.h header file and the HelloCpp.h file that was created by javah. You will also need to include any C or C++ headers that you intend to use:

image
#include <jni.h>
#include "HelloCpp.h"
#include <iostream>

Code snippet from JNIHelloCpp.cpp

The C++ function is written as normal. The parameters to the function allow interaction with the Java environment and the object that called the native code. They are beyond the scope of this example.

image
JNIEXPORT void JNICALL Java_HelloCpp_callCpp(JNIEnv* env, jobject javaobj)
{
    std::cout << "Hello from C++!" << std::endl;
}

Code snippet from JNIHelloCpp.cpp

Compiling this code into a library depends on your environment, but you will most likely need to tweak your compiler’s settings to include the JNI headers. Using the GCC compiler on Linux, your compile command might look like this:

g++ -shared -I/usr/java/jdk/include/ -I/usr/java/jdk/include/linux HelloCpp.cpp 
-o hellocpp.so

The output from the compiler is the library used by the Java program. As long as the shared library is somewhere in the Java class path, you can execute the Java program normally:

java HelloCpp

You should see the following result:

Hello from Java!
Hello from C++!

Of course, this example just scratches the surface of what is possible through JNI. You can use JNI to interface with OS-specific features or hardware drivers. For complete coverage of JNI, you should consult a Java text.

Mixing C++ with Perl and Shell Scripts

C++ contains a built-in general-purpose mechanism to interface with other languages and environments. You’ve already used it many times, probably without paying much attention to it — it’s the arguments to and return value from the main() function.

C and C++ were designed with command-line interfaces in mind. The main() function receives the arguments from the command line, and returns a status code that can be interpreted by the caller. In a scripting environment, arguments to and status codes from your program can be a powerful mechanism that allows you to interface with the environment.

Scripting versus Programming

Before delving into the details of mixing C++ and scripts, consider whether your project is an application or a script. The difference is subtle and subject to debate. The following descriptions are just guidelines. Many so-called scripts are just as sophisticated as full-blown applications. The question isn’t whether or not something can be done as a script, but whether or not a scripting language is the best tool.

An application is a program that performs a particular task. Modern applications typically involve some sort of user interaction. In other words, applications tend to be driven by the user, who directs the application to take certain actions. Applications often have multiple capabilities. For example, a user can use a photo editing application to scale an image, paint over an image, or print an image. Most of the software you would buy in a box is an application. Applications tend to be relatively large and often complex programs.

A script generally performs a single task, or a set of related tasks. You might have a script that automatically sorts your email, or backs up your important files. Scripts often run without user interaction, perhaps at a particular time each day or triggered by an event, such as the arrival of new mail. Scripts can be found at the OS level (such as a script that compresses files every night) or at the application level (such as a script that automates the process of shrinking and printing images). Automation is an important part of the definition of a script — scripts are usually written to codify a sequence of steps that a user would otherwise perform manually.

Now, consider the difference between a scripting language and a programming language. Not all scripts are necessarily written in scripting languages. You could write a script that sorts your email by using the C programming language, or you could write an equivalent script by using the Perl scripting language. Similarly, not all applications are written in programming languages. A suitably motivated coder could write a web browser in Perl if she really wanted to. The line is blurry.

In the end, what matters most is which language provides the functionality you need. If you are going to be interacting extensively with the operating system, you might consider a scripting language because scripting languages tend to have better support for OS interaction. If your project is going to be larger in scope and involve heavy user interaction, a programming language will probably be easier in the long run.

Using Scripts

The original Unix OS included a rather limited C library, which did not support certain common operations. Unix programmers therefore developed the habit of launching shell scripts from applications to accomplish tasks that should have had API or library support.

Today, many of these Unix programmers still insist on using scripts as a form of subroutine call. Usually, they execute the system() C library call with a string which is the script to execute. There are significant risks to this approach. For example, if there is an error in the script, the caller may or may not get a detailed error indication. The system() call is also exceptionally heavy-duty, since it has to create an entire new process to execute the script. This may ultimately be a serious performance bottleneck in your application.

In general, you should explore the features of the C++ library to see if there are better ways to do something. There are some platform-independent wrappers around a lot of platform-specific libraries; for example, the Boost <filesystem> library. Concepts like launching a Perl script by using system() to process some textual data may not be the best choice. Using techniques like the regular expression library of C++11 might be a better choice for your string processing needs.

A Practical Example — Encrypting Passwords

Assume that you have a system that writes everything a user sees and types to a file for auditing purposes. The file can be read only by the system administrator so that she can figure out who to blame if something goes wrong. An excerpt of such a file might look like this:

Login: bucky-bo
Password: feldspar
bucky-bo> mail
bucky-bo has no mail
bucky-bo> exit

While the system administrator may want to keep a log of all user activity, she may wish to obscure everybody’s passwords in case the file is somehow obtained by a hacker. A script seems like the natural choice for this project because it should happen automatically, perhaps at the end of every day. There is, however, one piece of the project that might not be best suited for a scripting language. Encryption libraries tend to exist mainly for high-level languages such as C and C++. Therefore, one possible implementation is to write a script that calls out to a C++ program to perform the encryption.

The following script uses the Perl language, though almost any scripting language could accomplish this task. If you don’t know Perl, you will still be able to follow along. The most important element of Perl syntax for this example is the ' character. The ' character instructs the Perl script to shell out to an external command. In this case, the script will shell out to a C++ program called encryptString.

pen.gif

Launching an external process causes a big overhead because a complete new process has to be created. You shouldn’t use it when you need to call the external process often. In this password encryption example, it is OK, because you can assume that a log file will only contain a few password lines.

The strategy for the script is to loop over every line of a file looking for lines that contain a password prompt. The script will write a new file, userlog.out, which contains the same text as the source file, except that all passwords are encrypted. The first step is to open the input file for reading and the output file for writing. Then, the script needs to loop over all the lines in the file. Each line in turn is placed in a variable called $line.

image
open (INPUT, "userlog.txt") or die "Couldn't open input file!";
open (OUTPUT, ">userlog.out") or die "Couldn't open output file!";
while ($line = <INPUT>) {

Code snippet from PerlprocessLog.pl

Next, the current line is checked against a regular expression to see if this particular line contains the Password: prompt. If it does, Perl will store the password in the variable $1.

    if ($line =~ m/^Password: (.*)/) {

Since a match has been found, the script calls the encryptString program with the detected password to obtain an encrypted version of it. The output of the program is stored in the $result variable, and the result status code from the program is stored in the variable $?. The script checks $? and quits immediately if there is a problem. If everything is okay, the password line is written to the output file with the encrypted password instead of the original one.

        $result = './encryptString $1';
        if ($? != 0) { exit(-1) }
        print OUTPUT "Password: $result
";

If the current line is not a password prompt, the script writes the line as is to the output file. At the end of the loop, it closes both files and exits.

    } else {
        print OUTPUT "$line";
    }
}
close (INPUT);
close (OUTPUT);

That’s it. The only other required piece is the actual C++ program. Implementation of a cryptographic algorithm is beyond the scope of this book. The important piece is the main() function because it accepts the string that should be encrypted as an argument.

Arguments are contained in the argv array of C-style strings. You should always consult the argc parameter before accessing an element of argv. If argc is 1, there is one element in the argument list and it is accessible as argv[0]. The 0th element of the argv array is generally the name of the program, so actual parameters begin at argv[1].

Following is the main() function for a C++ program that encrypts the input string. Notice that the program returns 0 for success and non-0 for failure, as is standard in Unix:

image
int main(int argc, char* argv[])
{
    if (argc < 2) {
        cerr << "Usage: " << argv[0] << " string-to-be-encrypted" << endl;
        return -1;
    }
    cout << encrypt(argv[1]);
    return 0;
}

Code snippet from PerlencryptString.cpp

pen.gif

There is actually a blatant security hole in this code. When the to-be-encrypted string is passed to the C++ program as a command-line argument, it may be visible to other users through the process table. A more secure way to get the information into the C++ program would be to send it through standard input, which is the forte of the expect scripting language.

Now that you’ve seen how easily C++ programs can be incorporated into scripting languages, you can combine the strengths of the two languages for your own projects. You can use a scripting language to interact with the OS and control the flow of the script, and a traditional programming language for the heavy lifting.

pen.gif

This example is just to demonstrate how to use Perl and C++ together. C++11 includes a regular expression library, which makes it very easy to convert this Perl/C++ solution into a pure C++ solution. This pure C++ solution will run much faster because it avoids calling an external program. See Chapter 14 for details on this regular expression library.

Mixing C++ with Assembly Code

C++ is considered a fast language, especially relative to other object-oriented languages. Yet, in some rare cases, you might want to use raw assembly code when speed is absolutely critical. The compiler generates assembly code from your source files, and this generated assembly code is fast enough for virtually all purposes. Both the compiler and the linker (when it supports link time code generation like VC++ 2010) use optimization algorithms to make the generated assembly code as fast as possible. These optimizers are getting more and more powerful by using special processor instruction sets such as MMX and SSE. These days, it’s very hard to write your own assembly code that will outperform the code generated by the compiler, unless you know all the little details of these enhanced instruction sets.

However, in case you do need it, the keyword asm can be used by a C++ compiler to allow the programmer to insert raw assembly code. The keyword is part of the C++ standard, but its implementation is compiler-defined. In some compilers, you can use asm to drop from C++ down to the level of assembly right in the middle of your program. Sometimes, the support for the asm keyword depends on your target architecture. For example, Microsoft VC++ 2010 supports the asm keyword when compiling in 32-bit mode, but asm is not supported when compiling in 64-bit mode.

Inline assembly can be useful in some applications, but we don’t recommend it for most programs. There are several reasons to avoid inline assembly code:

  • Your code is no longer portable to another processor once you start including raw assembly code for your platform.
  • Most programmers don’t know assembly languages and won’t be able to modify or maintain your code.
  • Assembly code is not known for its readability. It can hurt your program’s use of style.
  • Most of the time, it is not necessary. If your program is slow, look for algorithmic problems or consult some of the other performance suggestions in Chapter 24.
cross.gif

When you encounter performance issues in your application, first look into algorithmic speed-ups, and use raw assembly code only as a last resort.

Practically, if you have a computationally expensive block of code, you should move it to its own C++ function. If you determine, using performance profiling (see Chapter 24), that this function is a performance bottleneck, and there is no way to write the code smaller and faster, you might use raw assembly to try to increase performance.

In such a case, one of the first things you will want to do is declare the function extern "C" so the C++ name mangling is suppressed. Then, write a separate module in assembly code which performs the function more efficiently. The advantage of a separate module is that there is a “reference implementation” in C++ which is platform-independent; and, there is a platform-specific high-performance implementation, in raw assembly code. The use of extern "C" means that the assembly code can use a simple naming convention (otherwise, you have to reverse-engineer your compiler’s name mangling algorithm). Then, you can link with either the C++ version, or the assembly code version.

You would write this module in assembly code and run it through an assembler, rather than using inline asm directives in C++; this is particularly true in many of the popular x86-compatible-64-bit compilers, where the inline asm keyword is not supported.

However, using raw assembly code should only be done if there are significant performance improvement factors. A factor of 2 might justify the effort. A factor of 10 is compelling. A factor of 10% is not worth the effort.

SUMMARY

If you take away one point from this chapter, it should be that C++ is a flexible language. It exists in the sweet spot between languages that are too tied to a particular platform and languages that are too high-level and generic. Rest assured that when you develop code in C++, you aren’t locking yourself into the language forever. C++ can be mixed with other technologies and has a solid history and code base that help guarantee its relevance in the future.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset