2-8
INTRODUCTION TO THE INTEL ARCHITECTURE
KByte instruction cache and an 8-KByte data cache, both closely coupled to the pipeline. The
L2 cache is a 256-KByte, 512-KByte, or 1-MByte static RAM that is coupled to the core
processor through a full clock-speed 64-bit cache bus.
The centerpiece of the P6 Family processor microarchitecture is an innovative out-of-order
execution mechanism called dynamic execution. Dynamic execution incorporates three data-
processing concepts:
Deep branch prediction.
Dynamic data flow analysis.
Speculative execution.
Branch prediction is a concept found in most mainframe and high-speed microprocessor archi-
tectures. It allows the processor to decode instructions beyond branches to keep the instruction
pipeline full. In the P6 Family processors, the instruction fetch/decode unit uses a highly opti-
mized branch prediction algorithm to predict the direction of the instruction stream through
multiple levels of branches, procedure calls, and returns.
Dynamic data flow analysis involves real-time analysis of the flow of data through the processor
to determine data and register dependencies and to detect opportunities for out-of-order instruc-
tion execution. The P6 Family processors dispatch/execute unit can simultaneously monitor
many instructions and execute these instructions in the order that optimizes the use of the
processors multiple execution units, while maintaining data integrity. This out-of-order execu-
tion keeps the execution units busy even when cache misses and data dependencies among
instructions occur.
Speculative execution refers to the processors ability to execute instructions ahead of the
program counter but ultimately to commit the results in the order of the original instruction
stream. To make speculative execution possible, the P6 Family processors microarchitecture
decouples the dispatching and executing of instructions from the commitment of results. The
processors dispatch/execute unit uses data-flow analysis to execute all available instructions in
the instruction pool and store the results in temporary registers. The retirement unit then linearly
searches the instruction pool for completed instructions that no longer have data dependencies
with other instructions or unresolved branch predictions. When completed instructions are
found, the retirement unit commits the results of these instructions to memory and/or the IA
registers (the processors eight general-purpose registers and eight floating-point unit data regis-
ters) in the order they were originally issued and retires the instructions from the instruction
pool.
Through deep branch prediction, dynamic data-flow analysis, and speculative execution,
dynamic execution removes the constraint of linear instruction sequencing between the tradi-
tional fetch and execute phases of instruction execution. It allows instructions to be decoded
deep into multi-level branches to keep the instruction pipeline full. It promotes out-of-order
instruction execution to keep the processors six instruction execution units running at full
capacity. And finally, it commits the results of executed instructions in original program order
to maintain data integrity and program coherency.
The following section describes the P6 Family processor microarchitecture in greater detail. The
Pentium
®
Pro processor architecture is the base architecture for the processors that followed it.