9-17
PROGRAMMING WITH THE STREAMING SIMD EXTENSIONS
Floating-Point Control and Status Register) instruction stores the Streaming SIMD Extensions
control and status word to memory.
The FXSAVE instruction saves FP and MMX state and SIMD floating-point state to memory.
Unlike FSAVE, FXSAVE it does not clear the x87-FP state. FXRSTOR loads FP and MMX
state and SIMD floating-point state from memory.
9.3.9.Cacheability Control Instructions
Data referenced by a programmer can have temporal (data will be used again) or spatial (data
will be in adjacent locations, e.g. same cache line) locality. Some multimedia data types, such
as the display list in a 3D graphics application, are referenced once and not reused in the imme-
diate future. We will refer to this data type as non-temporal data. Thus, the programmer does not
want the applications cached code and data to be overwritten by this non-temporal data. The
cacheability control instructions enable the programmer to control caching so that non-temporal
accesses will minimize cache pollution.
In addition, the execution engine needs to be fed such that it does not become stalled waiting for
data. Streaming SIMD Extensions allow the programmer to prefetch data long before its final
use. These instructions are not architectural since they do not update any architectural state and
are specific to each implementation. The programmer may have to tune his application for each
implementation to take advantage of these instructions. These instructions merely provide a hint
to the hardware, and they will not generate exceptions or faults. Excessive use of prefetch
instructions may degrade processor performance due to resource allocation.
The following three instructions provide programmatic control for minimizing cache pollution
when writing data to memory from either the MMX registers or the SIMD floating-point regis-
ters.
The MASKMOVQ (Non-temporal byte mask store of packed integer in an MMX
register) instruction stores data from an MMX register to the location specified by the
(DS) EDI register. The most significant bit in each byte of the second MMX mask
register is used to selectively write the data of the first register on a per-byte basis. The
instruction is implicitly weakly-ordered, with all of the characteristics of the WC memory
type; successive non-temporal stores may not write memory in program-order, do not
write-allocate (i.e., the processor will not fetch the corresponding cache line into the cache
hierarchy, prior to performing the store), write combine/collapse, and minimize cache
pollution.
The MOVNTQ (Non-temporal store of packed integer in an MMX register) instruction
stores data from an MMX register to memory. The instruction is implicitly weakly-
ordered, does not write-allocate, and minimizes cache pollution.
The MOVNTPS (Non-temporal store of packed, single-precision, floating-point)
instruction stores data from a SIMD floating-point register to memory. The memory
address must be aligned to a 16-byte boundary; if it is not aligned, a general protection
exception will occur. The instruction is implicitly weakly-ordered, does not write-allocate,
and minimizes cache pollution.