Strings and Unicode

Dart Strings are immutable sequences of UTF-16 code units. UTF-16 combines surrogate pairs, and if you decode these, you get Unicode code points. Unicode terminology is terse, but Dart does a good job of exposing the different parts.

How to do it...

We will see the different methods to perform action on strings with special characters in unicode.dart:

String country = "Egypt";
String city = "Zürich";
String japanese = "日本語"; // nihongo meaning 'Japanese'

void main() {
  print('Unicode escapes: uFE18'), //  the ⎕ symbol
  print(country[0]);                 // E
  print(country.codeUnitAt(0));      // 69
  print(country.codeUnits);          // [69, 103, 121, 112, 116]
  print(country.runes.toList());     // [69, 103, 121, 112, 116]
  print(new String.fromCharCode(69)); // E
  print(new String.fromCharCodes([69, 103, 121, 112, 116])); // Egypt
  print(city[1]);                 // ü
  print(city.codeUnitAt(1));      // 252
  print(city.codeUnits);          // [90, 252, 114, 105, 99, 104]
  print(city.runes.toList());     // [90, 252, 114, 105, 99, 104]
  print(new String.fromCharCode(252)); // ü
  print(new String.fromCharCodes([90, 252, 114, 105, 99, 104])); // Zürich
  print(japanese[0]);                 // 日
  print(japanese.codeUnitAt(0));      // 26085
  print(japanese.codeUnits);          // [26085, 26412, 35486]
  print(japanese.runes.toList());     // [26085, 26412, 35486]
  print(new String.fromCharCode(35486)); // 語
  print(new String.fromCharCodes([26085, 26412, 35486]));                 // 日本語
}

How it works...

Within a string, Unicode characters can be escaped by using u. The index operator [] on a string gives you the string representation of the UTF-16 code unit. These are also accessible as integers representing code points (also called runes) through the codeUnitAt() or codeUnits methods. The static member charCode(s) can take UTF-16 code units or runes. They work in the following way; if the char-code value is 16 bits (a single UTF-16 code unit), it is copied literally. Otherwise, it is of length 2 and the code units form a surrogate pair.

There's more...

The dart:convert code contains a UTF-8 encoder/decoder that transforms between strings and bytes. The utf8encoding.dart file shows how you can use these methods, as shown in the following code:

import 'dart:convert' show UTF8;

String str = "Acción"; // Spanish for 'Action'

void main() {
List<int> encoded = UTF8.encode(str);
  print(encoded); // [65, 99, 99, 105, 195, 179, 110]
  // The UTF8 code units are reinterpreted as
  // Latin-1 code points (a subset of Unicode code points).
  String latin1String = new String.fromCharCodes(encoded);
  print(latin1String); // Acción
  print(latin1String.codeUnits); 
// [65, 99, 99, 105, 195, 179, 110]
  var string = UTF8.decode(encoded);
  print(string); // Acción
}
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset