Manipulating strings in JNI

Strings are somewhat complicated in JNI, mainly because Java strings and C strings are internally different. This recipe will cover the most commonly used JNI string features.

Getting ready

Understanding the basics of encoding is essential to comprehend the differences between Java string and C string. We'll give a brief introduction to Unicode.

According to the Unicode Consortium, the Unicode Standard is defined as follows:

The Unicode Standard is a character coding system designed to support the worldwide interchange, processing, and display of the written texts of the diverse languages and technical disciplines of the modern world. In addition, it supports classical and historical texts of many written languages.

Unicode assigns a unique number for each character it defines, called code point. There are mainly two categories of encoding methods that support the entire Unicode character set, or a subset of it.

The first one is the Unicode Transformation Format (UTF), which encodes a Unicode code point into a variable number of code values. UTF-8, UTF-16, UTF-32, and a few others belong to this category. The numbers 8, 16, and 32 refer to the number of bits in one code value. The second category is the Universal Character Set (UCS) encodings, which encodes a Unicode code point into a single code value. UCS2 and UCS4 belong to this category. The numbers 2 and 4 refer to the number of bytes in one code value.

Note

Unicode defines more characters than what two bytes can possibly represent. Therefore, UCS2 can only represent a subset of Unicode characters. Because Unicode defines fewer characters than what four bytes can represent, multiple code values of UTF-32 are never needed. Therefore, UTF-32 and UCS4 are functionally identical.

Java programming language uses UTF-16 to represent strings. If a character cannot fit in a 16-bit code value, a pair of code values named surrogate pair is used. C strings are simply an array of bytes terminated by a null character. The actual encoding/decoding is pretty much left to the developer and the underlying system. A modified version of UTF-8 is used by JNI to represent strings, including class, field, and method names in the native code. There are two differences between the modified UTF-8 and standard UTF-8. Firstly, the null character is encoded using two bytes. Secondly, only one-byte, two-byte, and three-byte formats of Standard UTF-8 are supported by JNI, while longer formats cannot be recognized properly. JNI uses its own format to represent Unicode that cannot fit into three bytes.

How to do it

The following steps show you how to create a sample Android project that illustrates string manipulation at JNI:

  1. Create a project named StringManipulation. Set the package name as cookbook.chapter2. Create an activity named StringManipulationActivity. Under the project, create a folder named jni. Refer to the Loading native libraries and registering native methods recipe in this chapter if you want more detailed instructions.
  2. Create a file named stringtest.c under the jni folder, then implement the passStringReturnString method as follows:
    JNIEXPORT jstring JNICALL Java_cookbook_chapter2_StringManipulationActivity_passStringReturnString(JNIEnv *pEnv, jobject pObj, jstring pStringP){
    
        __android_log_print(ANDROID_LOG_INFO, "native", "print jstring: %s", pStringP);
      const jbyte *str;
      jboolean *isCopy;
      str = (*pEnv)->GetStringUTFChars(pEnv, pStringP, isCopy);
      __android_log_print(ANDROID_LOG_INFO, "native", "print UTF-8 string: %s, %d", str, isCopy);
    
        jsize length = (*pEnv)->GetStringUTFLength(pEnv, pStringP);
      __android_log_print(ANDROID_LOG_INFO, "native", "UTF-8 string length (number of bytes): %d == %d", length, strlen(str));
      __android_log_print(ANDROID_LOG_INFO, "native", "UTF-8 string ends with: %d %d", str[length], str[length+1]);
      (*pEnv)->ReleaseStringUTFChars(pEnv, pStringP, str);
    
      char nativeStr[100];
      (*pEnv)->GetStringUTFRegion(pEnv, pStringP, 0, length, nativeStr);
      __android_log_print(ANDROID_LOG_INFO, "native", "jstring converted to UTF-8 string and copied to native buffer: %s", nativeStr);
    
      const char* newStr = "hello 安卓";
      jstring ret = (*pEnv)->NewStringUTF(pEnv, newStr);
      jsize newStrLen = (*pEnv)->GetStringUTFLength(pEnv, ret);
      __android_log_print(ANDROID_LOG_INFO, "native", "UTF-8 string with Chinese characters: %s, string length (number of bytes) %d=%d", newStr, newStrLen, strlen(newStr));
      return ret;
    }
  3. In the StringManipulationActivity.java Java code, add the code to load a native library, declare a native method, and invoke a native method. Refer to downloaded code for the source code details.
  4. Modify the res/layout/activity_passing_primitive.xml file according to step 8 of the Loading native libraries and registering native methods recipe in this chapter or the downloaded project code.
  5. Create a file called Android.mk under the jni folder. Refer to step 9 of the Loading native libraries and registering native methods recipe in this chapter or the downloaded code for details.
  6. Start a terminal, go to the jni folder, and type ndk-build to build the native library.
  7. Run the project on an Android device or emulator. We should see something similar to the following screenshot:
    How to do it

    The following should be seen at the logcat output:

    How to do it

How it works…

This recipe discusses string manipulation at JNI.

  • Character encoding: Android uses UTF-8 as its default charset, which is shown in our program by executing the Charset.defaultCharset().name() method. This means that the default encoding in the native code is UTF-8. As mentioned before, Java uses the UTF-16 charset. This infers that an encoding conversion is needed when we pass a string from Java to the native code and vice versa. Failing to do so will cause unwanted results. In our example, we tried printing jstring directly in the native code, but the result was some unrecognizable characters.

    Fortunately, JNI comes with a few pre-defined functions that do the conversion.

  • Java string to native string: When a native method is called with an input parameter of string type, the string received needs to be converted to the native string first. Two JNI functions can be used for different cases.

    The first function is GetStringUTFChars, which has the following prototype:

    const jbyte * GetStringUTFChars(JNIEnv *env, jstring string, jboolean *isCopy);

    This function converts the Java string into an array of UTF-8 characters. If a new copy of the Java string content is made, isCopy is set to true when the function returns; otherwise isCopy is set to false and the returned pointer points to the same characters as the original Java string.

    Tip

    It is not predictable whether the VM will return a new copy of the Java string. Therefore, we must be careful when converting a large string, as the possible memory allocation and copy may affect the performance and even cause "out of memory" issues. Also note that if isCopy is set to false, we cannot modify the returned UTF-8 native string, because it will modify the Java string content and break the immutability property of the Java string.

    Once we've finished all the operations with the converted native string, we should call ReleaseStringUTFChars to inform the VM that we don't need to access the UTF-8 native string anymore. The function has the following prototype, with the second parameter being the Java string and the third parameter being the UTF-8 native string:

    void ReleaseStringUTFChars(JNIEnv *env, jstring string, const char *utf);

    The second function for conversion is GetStringUTFRegion, with the following prototype:

    void GetStringUTFRegion(JNIEnv *env, jstring str, jsize start, jsize len, char *buf);

    The start and len parameters indicate the start position of the Java UTF-16 string and number of UTF-16 characters for conversion. The buf argument points to the location to store the converted native UTF-8 char array.

    Let's compare the two methods. The first method may or may not require allocation of new memory for the converted UTF-8 string depending on whether VM decides to make a new copy or not, whereas the second method made use of a pre-allocated buffer to store the converted content. In addition, the second method allows us to specify the position and length of the conversion source. Therefore, the following rules can be followed:

    • To modify the converted UTF-8 native string, the JNI method GetStringUTFRegion should be used
    • If we only need a substring of the original Java string, and the substring is not large, the GetStringUTFRegion should be used
    • If we're dealing with a large string, and we're not going to modify the converted UTF-8 native string, GetStringUTFChars should be used

      Tip

      In our example, we used a fixed length buffer when calling the GetStringUTFRegion function. We should make sure it is enough to hold the string, otherwise we should use the dynamic allocated array.

  • String length: The JNI function GetStringUTFLength can be used to get the string length of a UTF-8 jstring. Note that it returns the number of bytes and not the number of UTF-8 characters, as shown in our example.
  • Native string to Java string: We also need to return string data from the native code to Java code at times. The returned string should be UTF-16 encoded. The JNI function NewStringUTF constructs a jstring from a UTF-8 native string. It has the following prototype:
    jstring NewStringUTF(JNIEnv *env, const char *bytes);
  • Conversion failure: GetStringUTFChars and NewStringUTF require allocation of memory space to store the converted string. If you run out of memory, these methods will throw an OutOfMemoryError exception and return NULL. We'll cover more about exception handling in the Checking errors and handling exceptions in JNI recipe.

There's more…

More about character encoding in JNI: JNI character encoding is much more complicated than what we covered here. Besides UTF-8, it also supports UTF-16 conversion functions. It is also possible to call Java string methods in the native code to encode/decode characters in other formats. Since Android uses UTF-8 as its platform charset, we only cover how to deal with conversions between Java UTF-16 and UTF-8 native string here.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset