Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Appendix D

NEON Intrinsics and Instructions

This appendix contains information on, and a list of instructions used with, the NEON engine. Data types, lane types, and intrinsics are listed.

DATA TYPES

Table D-1 lists the different data types supported on the NEON engine, and the corresponding C data types.

TABLE D-1: NEON Data Types

DATA TYPE	D-REGISTER (64 BITS)	Q-REGISTER (128 BITS)
Signed integers	int8x8_t	int8x16_t
	int16x4_t	int16x8_t
	int32x2_t	int32x4_t
	int64x1_t	int64x2_t
Unsigned integers	uint8x8_t	uint8x16_t
	uint16x4_t	uint16x8_t
	uint32x2_t	uint32x4_t
	uint64x1_t	uint64x2_t
Floating-point	float16x4_t	float16x8_t
	float32x2_t	float32x4_t
Polynomial	poly8x8_t	poly8x16_t
	poly16x4_t	poly16x8_t

LANE TYPES

Table D-2 lists the different lane types per class, and the amount of possible types for each class.

TABLE D-2: Data Lane Types

CLASS	COUNT	TYPES
int	6	int8, int16, int32, uint8, uint16, uint32
int/64	8	int8, int16, int32, int64, uint8, uint16, uint32, uint64
sint	3	int8, int16, int32
sint16/32	2	int16, int32
int32	2	int32, uint32
8-bit	3	int8, uint8, poly8
int/poly8	7	int8, int16, int32, uint8, uint16, uint32, poly8
int/64/poly	10	int8, int16, int32, int64, uint8, uint16, uint32, uint64, poly8, poly16
arith	7	int8, int16, int32, uint8, uint16, uint32, float32
arith/64	9	int8, int16, int32, int64, uint8, uint16, uint32, uint64, float32
arith/poly8	8	int8, int16, int32, uint8, uint16, uint32, poly8, float32
floating	1	float32
any	11	int8, int16, int32, int64, uint8, uint16, uint32, uint64, poly8, poly16, float32

ASSEMBLY INSTRUCTIONS

Table D-3 contains a list of NEON instructions, as well as a brief description of each instruction.

TABLE D-3: NEON Instructions

INSTRUCTION	DESCRIPTION
VABA	Absolute difference and Accumulate
VABD	Absolute difference
VABS	Absolute Value
VACGE	Absolute Compare Greater Than or Equal
VACGT	Absolute Compare Greater Than
VACLE	Absolute Compare Less Than or Equal
VACLT	Absolute Compare Less Than
VADD	Add
VADDHN	Add, Select High Half
VAND	Logical AND
VBIC	Bitwise Bit Clear
VBIF	Bitwise Insert if False
VBIT	Bitwise Insert if True
VBSL	Bitwise Select
VCEQ	Compare Equal
VCGE	Compare Greater Than or Equal
VCGT	Compare Greater Than
VCLE	Compare Less Than or Equal
VCLS	Count Leading Sign bits
VCLT	Compare Less Than
VCLZ	Count Leading Zeroes
VCNT	Count set bits
VCVT	Convert between different number formats
VDUP	Duplicate scalar to all lanes of vector
VEOR	Bitwise Exclusive OR
VEXT	Extract
VFMA	Fused Multiply and Accumulate
VFMS	Fused Multiply and Subtract
VHADD	Halving Add
VHSUB	Halving Subtract
VLD	Vector Load
VMAX	Maximum
VMIN	Minimum
VMLA	Multiply and Accumulate
VMLS	Multiply and Subtract
VMOV	Move
VMOVL	Move Long
VMOVN	Move Narrow
VMUL	Multiply
VMVN	Move Negative
VNEG	Negate
VORN	Bitwise OR NOT
VORR	Bitwise OR
VPADAL	Pairwise Add and Accumulate
VPADD	Pairwise Add
VPMAX	Pairwise Maximum
VPMIN	Pairwise Minimum
VQABS	Absolute Value, Saturate
VQADD	Add, Saturate
VQDMLAL	Saturating Double Multiply Accumulate
VQDMLSL	Saturating Double Multiply and Subtract
VQDMUL	Saturating Double Multiply
VQDMULH	Saturating Double Multiply returning High half
VQMOVN	Saturating Move
VQNEG	Negate, Saturate
VQRDMULH	Saturating Double Multiply returning High half
VQRSHL	Shift left, Round, Saturate
VQRSHR	Shift Right, Round, Saturate
VQSHL	Shift Left, Saturate
VQSHR	Shift Right, Saturate
VQSUB	Subtract, Saturate
VRADDH	Add, Select High Half, Round
VRECPE	Reciprocal Estimate
VRECPS	Reciprocal Step
VREV	Reverse Elements
VRHADD	Halving Add, Round
VRSHR	Shift Right and Round
VRSQRTE	Reciprocal Square Root Estimate
VRSQRTS	Reciprocal Square Root Step
VRSRA	Shift Right, Round and Accumulate
VRSUBH	Subtract, select High half, Round
VSHL	Shift Left
VSHR	Shift Right
VSLI	Shift Left and Insert
VSRA	Shift Right, Accumulate
VSRI	Shift Right and Insert
VST	Vector Store
VSUB	Subtract
VSUBH	Subtract, Select High half
VSWP	Swap Vectors
VTBL	Vector Table Lookup
VTBX	Vector Table Extension
VTRN	Vector Transpose
VTST	Test Bits
VUZP	Vector Unzip
VZIP	Vector Zip

INTRINSIC NAMING CONVENTIONS

Intrinsics provide an elegant way to write NEON instructions using C. NEON intrinsics are created using the following structure:

v[q][r]name[u][n][q][_lane][_n][_result]_type

where:

q indicates a saturating operation.
r indicates a rounding operation.
name is the descriptive name of the operation.
u indicates signed-to-unsigned saturation.
n indicates a narrowing operation.
q indicates an operation on 128-bit vectors.
_n indicates a scalar operand supplied as an argument.
_lane indicates a scalar operand taken from the lane of a vector.
result is the result type in short form.

For example, vmul_s16 multiplies two vectors of signed 16-bit values and is equivalent to VMUL.I16. Some examples in C include:

uint32x4_t vec128 = vld1q_u32(i); // Load 4 32-bit values
uint8x8_t vadd_u8 (uint8x8_t,
    uint8x8_t); //Add two lanes
int8x16_t vaddq_s8 (int8x16_t, int8x16_t); //Saturating add two lanes

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Appendix D: NEON Intrinsics and Instructions

Create new playlist

Sign In

Sign Up

DATA TYPES

LANE TYPES

ASSEMBLY INSTRUCTIONS

INTRINSIC NAMING CONVENTIONS

Table of Contents for
Appendix D: NEON Intrinsics and Instructions