pg_0041

Index

2-11

INTRODUCTION TO THE INTEL ARCHITECTURE

2.5.2.Fetch/Decode Unit

The fetch/decode unit reads a stream of IA instructions from the L1 instruction cache and

decodes them into a series of micro-operations called micro-ops. This micro-op stream (still

in the order of the original instruction stream) is then sent to the instruction pool.

The instruction fetch unit fetches one 32-byte cache line per clock from the instruction cache. It

marks the beginning and end of the IA instructions in the cache lines and transmits 16 aligned

bytes to the decoder.

The instruction fetch unit computes the instruction pointer, based on inputs from the branch

target buffer, the exception/interrupt status, and branch-misprediction indications from the

integer execution units. The most important part of this process is the branch prediction

performed by the branch target buffer. Using an extension of Yehs algorithm, the 512-entry

branch target buffer looks many instructions ahead of the retirement program counter. Within

this instruction window there may be numerous branches, procedure calls, and returns that must

be correctly predicted if the dispatch/execute unit is to do useful work.

The instruction decoder contains three parallel decoders: two simple-instruction decoders and

one complex instruction decoder. Each decoder converts an IA instruction into one or more

triadic micro-ops (two logical sources and one logical destination per micro-op). Micro-ops are

primitive instructions that are executed by the processors six parallel execution units.

Many IA instructions are converted directly into single micro-ops by the simple instruction

decoders, and some instructions are decoded into from one to four micro-ops. The more

complex IA instructions are decoded into sequences of preprogrammed micro-ops obtained

from the microcode instruction sequencer. The instruction decoders also handle the decoding of

instruction prefixes and looping operations. The instruction decoder can generate up to six

micro-ops per clock cycle (one each for the simple instruction decoders and four for the complex

instruction decoder).

The IAs register set can cause resource stalls due to register dependencies. To solve this

problem, the processor provides 40 internal, general-purpose registers, which are used for the

actual computations. These registers can handle both integer and floating-point values. To allo-

cate the internal registers, the enqueued micro-ops from the instruction decoder are sent to the

physical register references.

In the final step of the decoding process, the allocator in the register alias table unit adds status

bits and flags to the micro-ops to prepare them for out-of-order execution and sends the resulting

micro-ops to the instruction pool.

2.5.3.Instruction Pool (Reorder Buffer)

Prior to entering the instruction pool (known formally as the reorder buffer), the micro-op

instruction stream is in the same order as the IA instruction stream that was sent to the instruc-

tion decoder. No reordering of instructions has taken place.

The reorder buffer is an array of content-addressable memory, arranged into 40 micro-op regis-

ters. It contains micro-ops that are waiting to be executed, as well as those that have already been

Index