Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Using SIMD for enhanced performance

A lot of modern CPUs and GPUs provide Single Instruction Multiple Data (SIMD) support. Four 32-bit data values (integers or floats) can be processed in parallel with the help of 128-bit special registers. This provides a potential speedup of 400 percent for image processing, 3D graphics, audio processing, and other numeric computation algorithms. Also, machine-learning algorithms (such as for automatic speech recognition) that use a Gauss Mixture Model (GMM) benefit from SIMD.

How to do it…

Dart lets you work with this feature by using the special SIMD x types from the typed_data library. It offers the following four types:

Int32x4, which represents four 32-bit integer values
Float32x4, which represents four single-precision floating point values
List structures to contain the 32-bit integer values, such as Int32x4List
Float32x4List, list structure to contain the 32-bit floating point values

Let's see some examples of SIMD operations in simd.dart; the different types are highlighted in the following snippet:

import 'dart:typed_data';

void main() {
  var a = new Float32x4(14.1, 6.7, 56.3, 78.41);
  var b = new Float32x4(12.3, 5.4, 81.7, 13.43);
  Float32x4 sum = new Float32x4.zero(); //
  print(sum); // [0.000000, 0.000000, 0.000000, 0.000000]
  sum = a + b;
  print(sum); // [26.400002, 12.100000, 138.000000, 91.840004]
  print(sum.z); // 138.0
  // b.y = 3.14;  // --> NoSuchMethodError
  b = b.withY(3.14);
  print(b); // [12.300000, 3.140000, 81.699997, 13.430000]
  b = b.shuffle(Float32x4.WYXZ);
  print(b); // [13.430000, 3.140000, 12.300000, 81.699997]
  // a < b; // There is no such operator in Float32x4
  Int32x4 mask = a.greaterThan(b);  // Create selection mask.
  Float32x4 c = mask.select(a, b);   // Select.
  print(c); // [14.100000, 6.700000, 56.299999, 81.699997]
  // selectively applying an operation:
  Float32x4 v = new Float32x4(22.0, 33.0, 44.0, 55.0);
  // mask = [0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0x0]
  mask = new Int32x4.bool(true, true, false, true);
  // r = [4.0, 9.0, 16.0, 25.0].
  Float32x4 r = v - v;
  v = mask.select(r, v);
  print(v); // [0.000000, 0.000000, 44.000000, 0.000000]
}

Tip

If your application needs a list of Float32 objects and you can deploy them on an SIMD platform, then be sure to use Float32x4List instead of List<Float32x4> to get better performance.

How it works...

The Float32x4 object offers the standard set of arithmetic operations and more. A Float32x4 object is in fact an immutable object with operations that create new immutable Float32x4 objects. The Int32x4 object is more limited, being useful for comparison, branching, and selection. In code that is optimized for these types, the values are mapped directly to SIMD registers, and operations on them compile into a single SIMD instruction with no overhead. You can think of an SIMD value as a horizontal compartment being subdivided into four lanes, respectively called x, y, z, and w, as shown in the following screenshot:

The SIMD architecture

An operation on two SIMD values happens on all the lanes simultaneously. With .x and other values, you can read the values of the individual lanes, but attention, this is slow. Because an SIMD value is immutable, a b.y = value statement is illegal. However, the withX methods let you do this, but again this is slow. Reordering values in one SIMD is done with a number of shuffle methods, where the lane order (WYXZ) indicates the new order. An SIMD instance contains four numbers, so comparisons such as < or >= cannot be defined. If you want to make c equal to the higher values of a and b, first you have to create a mask with the greaterThan operation, and then perform a select operation on it. An analogous masking technique is used if you want to perform an operation on some of the lanes only.

At this time, you can get this performance acceleration on all IA32/X64 platforms, and on ARM only if the processor supports NEON technology, and its implementation is pending for JavaScript. Thanks to the work of John McCutchan, Dart was the first web technology to use SIMD processing.

Table of Contents for
Using SIMD for enhanced performance

Using SIMD for enhanced performance

How to do it…

Tip

How it works...

See also

Table of Contents for Using SIMD for enhanced performance

Create new playlist

Sign In

Sign Up

Using SIMD for enhanced performance

How to do it…

Tip

How it works...

See also

Table of Contents for
Using SIMD for enhanced performance