A lot of modern CPUs and GPUs provide Single Instruction Multiple Data (SIMD) support. Four 32-bit data values (integers or floats) can be processed in parallel with the help of 128-bit special registers. This provides a potential speedup of 400 percent for image processing, 3D graphics, audio processing, and other numeric computation algorithms. Also, machine-learning algorithms (such as for automatic speech recognition) that use a Gauss Mixture Model (GMM) benefit from SIMD.
Dart lets you work with this feature by using the special SIMD x types from the typed_data
library. It offers the following four types:
Int32x4
, which represents four 32-bit integer valuesFloat32x4
, which represents four single-precision floating point valuesInt32x4List
Float32x4List
, list structure to contain the 32-bit floating point valuesLet's see some examples of SIMD operations in simd.dart
; the different types are highlighted in the following snippet:
import 'dart:typed_data'; void main() { var a = new Float32x4(14.1, 6.7, 56.3, 78.41); var b = new Float32x4(12.3, 5.4, 81.7, 13.43); Float32x4 sum = new Float32x4.zero(); // print(sum); // [0.000000, 0.000000, 0.000000, 0.000000] sum = a + b; print(sum); // [26.400002, 12.100000, 138.000000, 91.840004] print(sum.z); // 138.0 // b.y = 3.14; // --> NoSuchMethodError b = b.withY(3.14); print(b); // [12.300000, 3.140000, 81.699997, 13.430000] b = b.shuffle(Float32x4.WYXZ); print(b); // [13.430000, 3.140000, 12.300000, 81.699997] // a < b; // There is no such operator in Float32x4 Int32x4 mask = a.greaterThan(b); // Create selection mask. Float32x4 c = mask.select(a, b); // Select. print(c); // [14.100000, 6.700000, 56.299999, 81.699997] // selectively applying an operation: Float32x4 v = new Float32x4(22.0, 33.0, 44.0, 55.0); // mask = [0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0x0] mask = new Int32x4.bool(true, true, false, true); // r = [4.0, 9.0, 16.0, 25.0]. Float32x4 r = v - v; v = mask.select(r, v); print(v); // [0.000000, 0.000000, 44.000000, 0.000000] }
The Float32x4
object offers the standard set of arithmetic operations and more. A Float32x4
object is in fact an immutable object with operations that create new immutable Float32x4
objects. The Int32x4
object is more limited, being useful for comparison, branching, and selection. In code that is optimized for these types, the values are mapped directly to SIMD registers, and operations on them compile into a single SIMD instruction with no overhead. You can think of an SIMD value as a horizontal compartment being subdivided into four lanes, respectively called x, y, z, and w, as shown in the following screenshot:
An operation on two SIMD values happens on all the lanes simultaneously. With .x
and other values, you can read the values of the individual lanes, but attention, this is slow. Because an SIMD value is immutable, a b.y = value
statement is illegal. However, the withX
methods let you do this, but again this is slow. Reordering values in one SIMD is done with a number of shuffle
methods, where the lane order (WYXZ) indicates the new order. An SIMD instance contains four numbers, so comparisons such as < or >= cannot
be defined. If you want to make c
equal to the higher values of a
and b
, first you have to create a mask with the greaterThan
operation, and then perform a select
operation on it. An analogous masking technique is used if you want to perform an operation on some of the lanes only.
At this time, you can get this performance acceleration on all IA32/X64 platforms, and on ARM only if the processor supports NEON technology, and its implementation is pending for JavaScript. Thanks to the work of John McCutchan, Dart was the first web technology to use SIMD processing.