pg_0038

<<<

Index

>>>

2-8

INTRODUCTION TO THE INTEL ARCHITECTURE

KByte instruction cache and an 8-KByte data cache, both closely coupled to the pipeline. The

L2 cache is a 256-KByte, 512-KByte, or 1-MByte static RAM that is coupled to the core

processor through a full clock-speed 64-bit cache bus.

The centerpiece of the P6 Family processor microarchitecture is an innovative out-of-order

execution mechanism called dynamic execution. Dynamic execution incorporates three data-

processing concepts:

Deep branch prediction.

Dynamic data flow analysis.

Speculative execution.

Branch prediction is a concept found in most mainframe and high-speed microprocessor archi-

tectures. It allows the processor to decode instructions beyond branches to keep the instruction

pipeline full. In the P6 Family processors, the instruction fetch/decode unit uses a highly opti-

mized branch prediction algorithm to predict the direction of the instruction stream through

multiple levels of branches, procedure calls, and returns.

Dynamic data flow analysis involves real-time analysis of the flow of data through the processor

to determine data and register dependencies and to detect opportunities for out-of-order instruc-

tion execution. The P6 Family processors dispatch/execute unit can simultaneously monitor

many instructions and execute these instructions in the order that optimizes the use of the

processors multiple execution units, while maintaining data integrity. This out-of-order execu-

tion keeps the execution units busy even when cache misses and data dependencies among

instructions occur.

Speculative execution refers to the processors ability to execute instructions ahead of the

program counter but ultimately to commit the results in the order of the original instruction

stream. To make speculative execution possible, the P6 Family processors microarchitecture

decouples the dispatching and executing of instructions from the commitment of results. The

processors dispatch/execute unit uses data-flow analysis to execute all available instructions in

the instruction pool and store the results in temporary registers. The retirement unit then linearly

searches the instruction pool for completed instructions that no longer have data dependencies

with other instructions or unresolved branch predictions. When completed instructions are

found, the retirement unit commits the results of these instructions to memory and/or the IA

registers (the processors eight general-purpose registers and eight floating-point unit data regis-

ters) in the order they were originally issued and retires the instructions from the instruction

pool.

Through deep branch prediction, dynamic data-flow analysis, and speculative execution,

dynamic execution removes the constraint of linear instruction sequencing between the tradi-

tional fetch and execute phases of instruction execution. It allows instructions to be decoded

deep into multi-level branches to keep the instruction pipeline full. It promotes out-of-order

instruction execution to keep the processors six instruction execution units running at full

capacity. And finally, it commits the results of executed instructions in original program order

to maintain data integrity and program coherency.

The following section describes the P6 Family processor microarchitecture in greater detail. The

Pentium

Pro processor architecture is the base architecture for the processors that followed it.

<<<

Index

>>>