pg_0837

Index

C-5

COMPILER INTRINSICS AND FUNCTIONAL EQUIVALENTS

MOVSS__m128 _mm_load_ss(float * p)

Loads an SP FP value into the low word

and clears the upper three words.

void_mm_store_ss(float * p, __m128 a)

Stores the lower SP FP value.

__m128 _mm_move_ss(__m128 a, __m128 b)

Sets the low word to the SP FP value of b.

The upper 3 SP FP values are passed

through from a.

MOVUPS__m128 _mm_loadu_ps(float * p)

Loads four SP FP values. The address

need not be 16-byte-aligned.

void_mm_storeu_ps(float *p, __m128 a)

Stores four SP FP values. The address

need not be 16-byte-aligned.

MULSS

__m128 _mm_mul_ss(__m128 a, __m128 b)

Multiplies the lower SP FP values of a and

b; the upper three SP FP values are

passed through from a.

ORPS

__m128 _mm_or_ps(__m128 a, __m128 b)

Computes the bitwise OR of the four SP FP

values of a and b.

PACKSSWB__m64 _m_packsswb (__m64 m1, __m64 m2)

__m64 _mm_packs_pi16(__m64 m1, __m64 m2)

Pack the four 16-bit values from m1 into the

lower four 8-bit values of the result with

signed saturation, and pack the four 16-bit

values from m2 into the upper four 8-bit

values of the result with signed saturation.

PACKSSDW__m64 _m_packssdw (__m64 m1, __m64 m2)

__m64 _mm_packs_pi32 (__m64 m1, __m64 m2)

Pack the two 32-bit values from m1 into the

lower two 16-bit values of the result with

signed saturation, and pack the two 32-bit

values from m2 into the upper two 16-bit

values of the result with signed saturation.

PACKUSWB__m64 _m_packuswb(__m64 m1, __m64 m2)

__m64 _mm_packs_pu16(__m64 m1, __m64 m2)

Pack the four 16-bit values from m1 into the

lower four 8-bit values of the result with

unsigned saturation, and pack the four 16-

bit values from m2 into the upper four 8-bit

values of the result with unsigned

saturation.

PADDB

__m64 _m_paddb(__m64 m1, __m64 m2)

__m64 _mm_add_pi8(__m64 m1, __m64 m2)

Add the eight 8-bit values in m1 to the eight

8-bit values in m2.

PADDW

__m64 _m_paddw(__m64 m1, __m64 m2)

__m64 _mm_addw_pi16__m64 m1, __m64 m2)

Add the four 16-bit values in m1 to the four

16-bit values in m2.

PADDD

__m64 _m_paddd(__m64 m1, __m64 m2)

__m64 _mm_add_pi32(__m64 m1, __m64 m2)

Add the two 32-bit values in m1 to the two

32-bit values in m2.

PADDSB__m64 _m_paddsb(__m64 m1, __m64 m2)

__m64 _mm_adds_pi8(__m64 m1, __m64 m2)

Add the eight signed 8-bit values in m1 to

the eight signed 8-bit values in m2 and

saturate.

PADDSW__m64 _m_paddsw(__m64 m1, __m64 m2)

__m64 _mm_adds_pi16(__m64 m1, __m64 m2)

Add the four signed 16-bit values in m1 to

the four signed 16-bit values in m2 and

saturate.

PADDUSB__m64 _m_paddusb(__m64 m1, __m64 m2)

__m64 _mm_adds_pu8(__m64 m1, __m64 m2)

Add the eight unsigned 8-bit values in m1

to the eight unsigned 8-bit values in m2 and

saturate.

PADDUSW__m64 _m_paddusw(__m64 m1, __m64 m2)

__m64 _mm_adds_pu16(__m64 m1, __m64 m2)

Add the four unsigned 16-bit values in m1

to the four unsigned 16-bit values in m2

and saturate.

PAND

__m64 _m_pand(__m64 m1, __m64 m2)

__m64 _mm_and_si64(__m64 m1, __m64 m2)

Perform a bitwise AND of the 64-bit value in

m1 with the 64-bit value in m2.

Table C-1. Simple Intrinsics

Mnemonic

Intrinsic

Description

Index