pg_0037

<<<

Index

>>>

2-7

INTRODUCTION TO THE INTEL ARCHITECTURE

by Intel in 1993), the Pentium

Pro processor, with its advanced superscalar microarchitecture,

sets an impressive performance standard. In designing the P6 Family processors, one of the

primary goals of the Intel chip architects was to exceed the performance of the Pentium

processor significantly while still using the same 0.6-micrometer, four-layer, metal BICMOS

manufacturing process. Using the same manufacturing process as the Pentium

processor meant

that performance gains could only be achieved through substantial advances in the microarchi-

tecture.

The resulting P6 Family processor microarchitecture is a three-way superscalar, pipelined archi-

tecture. The term three-way superscalar means that using parallel processing techniques, the

processor is able on average to decode, dispatch, and complete execution of (retire) three

instructions per clock cycle. To handle this level of instruction throughput, the P6 Family

processors use a decoupled, 12-stage superpipeline that supports out-of-order instruction execu-

tion. Figure 2-1 shows a conceptual view of this pipeline, with the pipeline divided into four

processing units (the fetch/decode unit, the dispatch/execute unit, the retire unit, and the instruc-

tion pool). Instructions and data are supplied to these units through the bus interface unit.

To insure a steady supply of instructions and data to the instruction execution pipeline, the P6

Family processor microarchitecture incorporates two cache levels. The L1 cache provides an 8-

Figure 2-1. The Processing Units in the

P6 Family

Processor

Microarchitecture

and Their Interface with the Memory Subsystem

Architecture

Cache Bus

Fetch/Decode

Unit

Dispatch/

Execute Unit

Retire Unit

Registers

Intel

Instruction

Pool

L1 Instruction

Cache

L1 Data Cache

Fetch

Load

Store

Bus Interface Unit

L2 Cache

System Bus

<<<

Index

>>>