C-5
COMPILER INTRINSICS AND FUNCTIONAL EQUIVALENTS
MOVSS__m128 _mm_load_ss(float * p)
Loads an SP FP value into the low word
and clears the upper three words.
void_mm_store_ss(float * p, __m128 a)
Stores the lower SP FP value.
__m128 _mm_move_ss(__m128 a, __m128 b)
Sets the low word to the SP FP value of b.
The upper 3 SP FP values are passed
through from a.
MOVUPS__m128 _mm_loadu_ps(float * p)
Loads four SP FP values. The address
need not be 16-byte-aligned.
void_mm_storeu_ps(float *p, __m128 a)
Stores four SP FP values. The address
need not be 16-byte-aligned.
MULSS
__m128 _mm_mul_ss(__m128 a, __m128 b)
Multiplies the lower SP FP values of a and
b; the upper three SP FP values are
passed through from a.
ORPS
__m128 _mm_or_ps(__m128 a, __m128 b)
Computes the bitwise OR of the four SP FP
values of a and b.
PACKSSWB__m64 _m_packsswb (__m64 m1, __m64 m2)
__m64 _mm_packs_pi16(__m64 m1, __m64 m2)
Pack the four 16-bit values from m1 into the
lower four 8-bit values of the result with
signed saturation, and pack the four 16-bit
values from m2 into the upper four 8-bit
values of the result with signed saturation.
PACKSSDW__m64 _m_packssdw (__m64 m1, __m64 m2)
__m64 _mm_packs_pi32 (__m64 m1, __m64 m2)
Pack the two 32-bit values from m1 into the
lower two 16-bit values of the result with
signed saturation, and pack the two 32-bit
values from m2 into the upper two 16-bit
values of the result with signed saturation.
PACKUSWB__m64 _m_packuswb(__m64 m1, __m64 m2)
__m64 _mm_packs_pu16(__m64 m1, __m64 m2)
Pack the four 16-bit values from m1 into the
lower four 8-bit values of the result with
unsigned saturation, and pack the four 16-
bit values from m2 into the upper four 8-bit
values of the result with unsigned
saturation.
PADDB
__m64 _m_paddb(__m64 m1, __m64 m2)
__m64 _mm_add_pi8(__m64 m1, __m64 m2)
Add the eight 8-bit values in m1 to the eight
8-bit values in m2.
PADDW
__m64 _m_paddw(__m64 m1, __m64 m2)
__m64 _mm_addw_pi16__m64 m1, __m64 m2)
Add the four 16-bit values in m1 to the four
16-bit values in m2.
PADDD
__m64 _m_paddd(__m64 m1, __m64 m2)
__m64 _mm_add_pi32(__m64 m1, __m64 m2)
Add the two 32-bit values in m1 to the two
32-bit values in m2.
PADDSB__m64 _m_paddsb(__m64 m1, __m64 m2)
__m64 _mm_adds_pi8(__m64 m1, __m64 m2)
Add the eight signed 8-bit values in m1 to
the eight signed 8-bit values in m2 and
saturate.
PADDSW__m64 _m_paddsw(__m64 m1, __m64 m2)
__m64 _mm_adds_pi16(__m64 m1, __m64 m2)
Add the four signed 16-bit values in m1 to
the four signed 16-bit values in m2 and
saturate.
PADDUSB__m64 _m_paddusb(__m64 m1, __m64 m2)
__m64 _mm_adds_pu8(__m64 m1, __m64 m2)
Add the eight unsigned 8-bit values in m1
to the eight unsigned 8-bit values in m2 and
saturate.
PADDUSW__m64 _m_paddusw(__m64 m1, __m64 m2)
__m64 _mm_adds_pu16(__m64 m1, __m64 m2)
Add the four unsigned 16-bit values in m1
to the four unsigned 16-bit values in m2
and saturate.
PAND
__m64 _m_pand(__m64 m1, __m64 m2)
__m64 _mm_and_si64(__m64 m1, __m64 m2)
Perform a bitwise AND of the 64-bit value in
m1 with the 64-bit value in m2.
Table C-1. Simple Intrinsics
Mnemonic
Intrinsic
Description