9-22
PROGRAMMING WITH THE STREAMING SIMD EXTENSIONS
For full details on how to determine what support is present for the Streaming SIMD Extensions,
please refer to the Intel Processor Identification and the CPUID Instruction Application Note
(AP-485), order number 241618-008.
9.5.2.Interfacing with Streaming SIMD Extensions Procedures
and Functions
The Streaming SIMD Extensions allow direct access to all SIMD floating-point registers. All
existing interface conventions that apply to the use of other general registers (for example: EAX,
EBX) will also apply to SIMD floating-point register usage.
An efficient interface to the Streaming SIMD Extensions routines might pass parameters and
return values through the SIMD floating-point registers or through a combination of memory
locations (view the stack) and SIMD floating-point registers. The three common IA-32 calling
conventions (cdecl, stdcall, and fastcall) have been extended to support the new register set for
Streaming SIMD Extensions in the following ways:
The first three __m128 parameters are passed in registers xmm0, xmm1, and xmm2 (args
in registers). Additional __m128 parameters are passed on the stack as usual.
__m128 return values are passed in xmm0.
Registers xmm0 through xmm7 are caller-save.
The caller must reserve the space in the argument block where the first three __m128 parameters
would normally appear. These locations are generally left empty by the caller, but can be used
by the callee as homes for the xmm0, xmm1, and xmm2 registers if necessary.
New versions of the stdarg.h and varargs.h headers are provided with the Intel C/C++ compiler.
These new implementations support variable argument lists containing __m128 data (i.e., where
padding may have been inserted as required for aligned parameters as described above). The
new convention requires that functions with variable argument lists be prototyped before calls
are made to them, and that, for this case only, the caller must fill the locations on the stack for
data in registers xmm0, xmm1, and xmm2. Callers to non-prototyped functions with variable
argument lists with __m128 data must pass parameters both on the stack and in registers.
9.5.3.Writing Code with MMX, Floating-Point, and Streaming
SIMD Extensions
The SIMD floating-point registers are separate from the FP / MMX registers. An application
can use Streaming SIMD Extensions and MMX instructions or Streaming SIMD Extensions
and x87-FP instructions simultaneously, without any penalty. An application can use x87-FP for
operations that need double or extended precision arithmetic, or for accessing any of the x87-FP
trigonometric instructions.
The restrictions on the simultaneous use of x87-FP and MMX instructions continue to exist,
because they share the same architectural registers. The user still needs to perform an EMMS
instruction when switching from MMX code to x87-FP code. However, the EMMS instruc-