Dart Strings are immutable sequences of UTF-16 code units. UTF-16 combines surrogate pairs, and if you decode these, you get Unicode code points. Unicode terminology is terse, but Dart does a good job of exposing the different parts.
We will see the different methods to perform action on strings with special characters in unicode.dart
:
String country = "Egypt"; String city = "Zürich"; String japanese = "日本語"; // nihongo meaning 'Japanese' void main() { print('Unicode escapes: uFE18'), // the ⎕ symbol print(country[0]); // E print(country.codeUnitAt(0)); // 69 print(country.codeUnits); // [69, 103, 121, 112, 116] print(country.runes.toList()); // [69, 103, 121, 112, 116] print(new String.fromCharCode(69)); // E print(new String.fromCharCodes([69, 103, 121, 112, 116])); // Egypt print(city[1]); // ü print(city.codeUnitAt(1)); // 252 print(city.codeUnits); // [90, 252, 114, 105, 99, 104] print(city.runes.toList()); // [90, 252, 114, 105, 99, 104] print(new String.fromCharCode(252)); // ü print(new String.fromCharCodes([90, 252, 114, 105, 99, 104])); // Zürich print(japanese[0]); // 日 print(japanese.codeUnitAt(0)); // 26085 print(japanese.codeUnits); // [26085, 26412, 35486] print(japanese.runes.toList()); // [26085, 26412, 35486] print(new String.fromCharCode(35486)); // 語 print(new String.fromCharCodes([26085, 26412, 35486])); // 日本語 }
Within a string, Unicode characters can be escaped by using u
. The index operator []
on a string gives you the string representation of the UTF-16 code unit. These are also accessible as integers representing code points (also called runes
) through the codeUnitAt()
or codeUnits
methods. The static member charCode(s)
can take UTF-16 code units or runes
. They work in the following way; if the char-code value is 16 bits (a single UTF-16 code unit), it is copied literally. Otherwise, it is of length 2 and the code units form a surrogate pair.
The dart:convert
code contains a UTF-8 encoder/decoder that transforms between strings and bytes. The
utf8encoding.dart
file shows how you can use these methods, as shown in the following code:
import 'dart:convert' show UTF8; String str = "Acción"; // Spanish for 'Action' void main() { List<int> encoded = UTF8.encode(str); print(encoded); // [65, 99, 99, 105, 195, 179, 110] // The UTF8 code units are reinterpreted as // Latin-1 code points (a subset of Unicode code points). String latin1String = new String.fromCharCodes(encoded); print(latin1String); // Acción print(latin1String.codeUnits); // [65, 99, 99, 105, 195, 179, 110] var string = UTF8.decode(encoded); print(string); // Acción }