Appendix D

NEON Intrinsics and Instructions

This appendix contains information on, and a list of instructions used with, the NEON engine. Data types, lane types, and intrinsics are listed.

DATA TYPES

Table D-1 lists the different data types supported on the NEON engine, and the corresponding C data types.

TABLE D-1: NEON Data Types

DATA TYPE D-REGISTER (64 BITS) Q-REGISTER (128 BITS)
Signed integers int8x8_t int8x16_t
int16x4_t int16x8_t
int32x2_t int32x4_t
int64x1_t int64x2_t
Unsigned integers uint8x8_t uint8x16_t
uint16x4_t uint16x8_t
uint32x2_t uint32x4_t
uint64x1_t uint64x2_t
Floating-point float16x4_t float16x8_t
float32x2_t float32x4_t
Polynomial poly8x8_t poly8x16_t
poly16x4_t poly16x8_t

LANE TYPES

Table D-2 lists the different lane types per class, and the amount of possible types for each class.

TABLE D-2: Data Lane Types

CLASS COUNT TYPES
int 6 int8, int16, int32, uint8, uint16, uint32
int/64 8 int8, int16, int32, int64, uint8, uint16, uint32, uint64
sint 3 int8, int16, int32
sint16/32 2 int16, int32
int32 2 int32, uint32
8-bit 3 int8, uint8, poly8
int/poly8 7 int8, int16, int32, uint8, uint16, uint32, poly8
int/64/poly 10 int8, int16, int32, int64, uint8, uint16, uint32, uint64, poly8, poly16
arith 7 int8, int16, int32, uint8, uint16, uint32, float32
arith/64 9 int8, int16, int32, int64, uint8, uint16, uint32, uint64, float32
arith/poly8 8 int8, int16, int32, uint8, uint16, uint32, poly8, float32
floating 1 float32
any 11 int8, int16, int32, int64, uint8, uint16, uint32, uint64, poly8, poly16, float32

ASSEMBLY INSTRUCTIONS

Table D-3 contains a list of NEON instructions, as well as a brief description of each instruction.

TABLE D-3: NEON Instructions

INSTRUCTION DESCRIPTION
VABA Absolute difference and Accumulate
VABD Absolute difference
VABS Absolute Value
VACGE Absolute Compare Greater Than or Equal
VACGT Absolute Compare Greater Than
VACLE Absolute Compare Less Than or Equal
VACLT Absolute Compare Less Than
VADD Add
VADDHN Add, Select High Half
VAND Logical AND
VBIC Bitwise Bit Clear
VBIF Bitwise Insert if False
VBIT Bitwise Insert if True
VBSL Bitwise Select
VCEQ Compare Equal
VCGE Compare Greater Than or Equal
VCGT Compare Greater Than
VCLE Compare Less Than or Equal
VCLS Count Leading Sign bits
VCLT Compare Less Than
VCLZ Count Leading Zeroes
VCNT Count set bits
VCVT Convert between different number formats
VDUP Duplicate scalar to all lanes of vector
VEOR Bitwise Exclusive OR
VEXT Extract
VFMA Fused Multiply and Accumulate
VFMS Fused Multiply and Subtract
VHADD Halving Add
VHSUB Halving Subtract
VLD Vector Load
VMAX Maximum
VMIN Minimum
VMLA Multiply and Accumulate
VMLS Multiply and Subtract
VMOV Move
VMOVL Move Long
VMOVN Move Narrow
VMUL Multiply
VMVN Move Negative
VNEG Negate
VORN Bitwise OR NOT
VORR Bitwise OR
VPADAL Pairwise Add and Accumulate
VPADD Pairwise Add
VPMAX Pairwise Maximum
VPMIN Pairwise Minimum
VQABS Absolute Value, Saturate
VQADD Add, Saturate
VQDMLAL Saturating Double Multiply Accumulate
VQDMLSL Saturating Double Multiply and Subtract
VQDMUL Saturating Double Multiply
VQDMULH Saturating Double Multiply returning High half
VQMOVN Saturating Move
VQNEG Negate, Saturate
VQRDMULH Saturating Double Multiply returning High half
VQRSHL Shift left, Round, Saturate
VQRSHR Shift Right, Round, Saturate
VQSHL Shift Left, Saturate
VQSHR Shift Right, Saturate
VQSUB Subtract, Saturate
VRADDH Add, Select High Half, Round
VRECPE Reciprocal Estimate
VRECPS Reciprocal Step
VREV Reverse Elements
VRHADD Halving Add, Round
VRSHR Shift Right and Round
VRSQRTE Reciprocal Square Root Estimate
VRSQRTS Reciprocal Square Root Step
VRSRA Shift Right, Round and Accumulate
VRSUBH Subtract, select High half, Round
VSHL Shift Left
VSHR Shift Right
VSLI Shift Left and Insert
VSRA Shift Right, Accumulate
VSRI Shift Right and Insert
VST Vector Store
VSUB Subtract
VSUBH Subtract, Select High half
VSWP Swap Vectors
VTBL Vector Table Lookup
VTBX Vector Table Extension
VTRN Vector Transpose
VTST Test Bits
VUZP Vector Unzip
VZIP Vector Zip

INTRINSIC NAMING CONVENTIONS

Intrinsics provide an elegant way to write NEON instructions using C. NEON intrinsics are created using the following structure:

v[q][r]name[u][n][q][_lane][_n][_result]_type

where:

  • q indicates a saturating operation.
  • r indicates a rounding operation.
  • name is the descriptive name of the operation.
  • u indicates signed-to-unsigned saturation.
  • n indicates a narrowing operation.
  • q indicates an operation on 128-bit vectors.
  • _n indicates a scalar operand supplied as an argument.
  • _lane indicates a scalar operand taken from the lane of a vector.
  • result is the result type in short form.

For example, vmul_s16 multiplies two vectors of signed 16-bit values and is equivalent to VMUL.I16. Some examples in C include:

uint32x4_t vec128 = vld1q_u32(i); // Load 4 32-bit values
uint8x8_t vadd_u8 (uint8x8_t,
    uint8x8_t); //Add two lanes
int8x16_t vaddq_s8 (int8x16_t, int8x16_t); //Saturating add two lanes
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset