pg_0243

Index

9-11

PROGRAMMING WITH THE STREAMING SIMD EXTENSIONS

The MOVLPS (Move unaligned, low packed, single-precision, floating-point) instruction trans-

fers 64 bits of packed data from memory to the lower two fields of a SIMD floating-point

The MOVMSKPS (Move mask packed, single-precision, floating-point) instruction transfers

the most significant bit of each of the four, packed, single-precision, floating-point numbers to

an IA integer register. This 4-bit value can then be used as a condition to perform branching.

The MOVSS (Move scalar single-precision, floating-point) instruction transfers the least signif-

icant 32 bits from memory to a SIMD floating-point register or vice versa, and between regis-

ters.

9.3.2.Arithmetic Instructions

9.3.2.1.PACKED/SCALAR ADDITION AND SUBTRACTION

The ADDPS (Add packed, single-precision, floating-point) and SUBPS (Subtract packed,

single-precision, floating-point) instructions add or subtract four pairs of packed, single-preci-

sion, floating-point operands.

The ADDSS (Add scalar single-precision, floating-point) and SUBSS (Subtract scalar single-

precision, floating-point) instructions add or subtract the least significant pair of packed, single-

precision, floating-point operands; the upper three fields are passed through from the source

operand.

9.3.2.2.PACKED/SCALAR MULTIPLICATION AND DIVISION

The MULPS (Multiply packed, single-precision, floating-point) instruction multiplies four pairs

of packed, single-precision, floating-point operands.

The MULSS (Multiply scalar single-precision, floating-point) instruction multiplies the least

significant pair of packed, single-precision, floating-point operands; the upper three fields are

passed through from the source operand.

The DIVPS (Divide packed, single-precision, floating-point) instruction divides four pairs of

packed, single-precision, floating-point operands.

The DIVSS (Divide scalar single-precision, floating-point) instruction divides the least signifi-

cant pair of packed, single-precision, floating-point operands; the upper three fields are passed

through from the source operand.

9.3.2.3.PACKED/SCALAR SQUARE ROOT

The SQRTPS (Square root packed, single-precision, floating-point) instruction returns the

square root of the packed four single-precision, floating-point numbers from the source to a

destination register.

Index