CIS-77 Home http://www.c-jump.com/CIS77/CIS77syllabus.htm

Encoding Real x86 Instructions


  1. Encoding Real x86 Instructions
  2. x86 Instructions Overview
  3. x86 Instruction Format Reference
  4. x86 Opcode Sizes
  5. x86 ADD Instruction Opcode
  6. Encoding x86 Instruction Operands, MOD-REG-R/M Byte
  7. General-Purpose Registers
  8. REG Field of the MOD-REG-R/M Byte
  9. MOD R/M Byte and Addressing Modes
  10. SIB (Scaled Index Byte) Layout
  11. Scaled Indexed Addressing Mode
  12. Encoding ADD Instruction Example
  13. Encoding ADD CL, AL Instruction
  14. Encoding ADD ECX, EAX Instruction
  15. Encoding ADD EDX, DISPLACEMENT Instruction
  16. Encoding ADD EDI, [EBX] Instruction
  17. Encoding ADD EAX, [ ESI + disp8 ] Instruction
  18. Encoding ADD EBX, [ EBP + disp32 ] Instruction
  19. Encoding ADD EBP, [ disp32 + EAX*1 ] Instruction
  20. Encoding ADD ECX, [ EBX + EDI*4 ] Instruction
  21. Encoding ADD Immediate Instruction
  22. Encoding Eight, Sixteen, and Thirty-Two Bit Operands
  23. Encoding Sixteen Bit Operands
  24. x86 Instruction Prefix Bytes
  25. Alternate Encodings for Instructions
  26. x86 Opcode Summary
  27. MOD-REG-R/M Byte Summary
  28. ISA Design Considerations
  29. ISA Design Challenges
  30. Intel Architecture Software Developer's Manual
  31. Intel Instruction Set Reference (Volume2)
  32. Chapter 3 of Intel Instruction Set Reference
  33. Intel Reference Opcode Bytes
  34. Intel Reference Opcode Bytes, Cont.
  35. Intel Reference Opcode Bytes, Cont.
  36. Intel Reference Opcode Bytes, Cont.
  37. Intel Reference Opcode Bytes, Cont.
  38. Intel Reference Opcode Bytes, Cont.
  39. Intel Reference Instruction Column

1. Encoding Real x86 Instructions



2. x86 Instructions Overview


  • Although the diagram seems to imply that instructions can be up to 16 bytes long, in actuality the x86 will not allow instructions greater than 15 bytes in length.

  • The prefix bytes are not the opcode expansion prefix discussed earlier - they are special bytes to modify the behavior of existing instructions.

  • x86 Instruction Encoding:

      x86 Instruction Encoding


3. x86 Instruction Format Reference


  • Another view of the x86 instruction format:

     

      The x86 instruction format


4. x86 Opcode Sizes


  • The x86 CPU supports two basic opcode sizes:

    1. standard one-byte opcode

    2. two-byte opcode consisting of a 0Fh opcode expansion prefix byte.

      The second byte then specifies the actual instruction.

  • x86 instruction format:

      The x86 instruction format


5. x86 ADD Instruction Opcode


  • Bit number zero marked s specifies the size of the operands the ADD instruction operates upon:

    • If s = 0 then the operands are 8-bit registers and memory locations.

    • If s = 1 then the operands are either 16-bits or 32-bits:

      • Under 32-bit operating systems the default is 32-bit operands if s = 1.

      • To specify a 16-bit operand (under Windows or Linux) you must insert a special operand-size prefix byte in front of the instruction (example of this later.)

  • _________________________

  • You'll soon see that this direction bit d creates a problem that results in one instruction have two different possible opcodes.

  • x86 ADD instruction opcode :

      x86 ADD Opcode

  • Bit number one, marked d, specifies the direction of the data transfer:

    • If d = 0 then the destination operand is a memory location, e.g.

              add [ebx], al
      
    • If d = 1 then the destination operand is a register, e.g.

              add al, [ebx]
      

6. Encoding x86 Instruction Operands, MOD-REG-R/M Byte


  •   MOD-REG-R/M Byte

  • The MOD field specifies x86 addressing mode:

      MOD Meaning

  • The REG field specifies source or destination register:

      x86 register encoding

  • The R/M field, combined with MOD, specifies either

    1. the second operand in a two-operand instruction, or

    2. the only operand in a single-operand instruction like NOT or NEG.

  • The d bit in the opcode determines which operand is the source, and which is the destination:

    • d=0: MOD R/M <- REG, REG is the source

    • d=1: REG <- MOD R/M, REG is the destination

  • ___________

  • (*) Technically, registers do not have an address, but we apply the term addressing mode to registers nonetheless.


7. General-Purpose Registers


  • The EAX, EDX, ECX, EBX, EBP, EDI, and ESI registers are 32-bit general-purpose registers, used for temporary data storage and memory access.

  • The AX, DX, CX, BX, BP, DI, and SI registers are 16-bit equivalents of the above, they represent the low-order 16 bits of 32-bit registers.

  • The AH, DH, CH, and BH registers represent the high-order 8 bits of the corresponding registers.

  •   16-bit general-purpose registers

  • Since the processor accesses registers more quickly than it accesses memory, you can make your programs run faster by keeping the most-frequently used data in registers.


8. REG Field of the MOD-REG-R/M Byte


  •   MOD-REG-R/M Byte

  • The REG field specifies an x86 register(*):

      x86 register encoding

  • Depending on the instruction , this can be either the source or the destination operand.

  • Many instructions have the d (direction) field in their opcode to choose REG operand role:

    1. If d=0, REG is the source,
      MOD R/M <- REG.

    2. If d=1, REG is the destination,
      REG <- MOD R/M.


9. MOD R/M Byte and Addressing Modes

  • MOD R/M Addressing Mode
    === === ================================
     00 000 [ eax ]
     01 000 [ eax + disp8 ]               (1)
     10 000 [ eax + disp32 ]
     11 000 register  ( al / ax / eax )   (2)
     00 001 [ ecx ]
     01 001 [ ecx + disp8 ]
     10 001 [ ecx + disp32 ]
     11 001 register  ( cl / cx / ecx )
     00 010 [ edx ]
     01 010 [ edx + disp8 ]
     10 010 [ edx + disp32 ]
     11 010 register  ( dl / dx / edx )
     00 011 [ ebx ]
     01 011 [ ebx + disp8 ]
     10 011 [ ebx + disp32 ]
     11 011 register  ( bl / bx / ebx )
     00 100 SIB  Mode                     (3)
     01 100 SIB  +  disp8  Mode
     10 100 SIB  +  disp32  Mode
     11 100 register  ( ah / sp / esp )
     00 101 32-bit Displacement-Only Mode (4)
     01 101 [ ebp + disp8 ]
     10 101 [ ebp + disp32 ]
     11 101 register  ( ch / bp / ebp )
     00 110 [ esi ]
     01 110 [ esi + disp8 ]
     10 110 [ esi + disp32 ]
     11 110 register  ( dh / si / esi )
     00 111 [ edi ]
     01 111 [ edi + disp8 ]
     10 111 [ edi + disp32 ]
     11 111 register  ( bh / di / edi )
    
  •  

  1. Addressing modes with 8-bit displacement fall in the range -128..+127 and require only a single byte displacement after the opcode (Faster!)

  2. The size bit in the opcode specifies 8 or 32-bit register size. To select a 16-bit register requires a prefix byte.

  3. The so-called scaled indexed addressing modes, SIB = scaled index byte mode.

  4. Note that there is no [ ebp ] addressing. It's slot is occupied by the 32-bit displacement only addressing mode. Intel decided that programmers can use [ ebp+ disp8 ] addressing mode instead, with its 8-bit displacement set equal to zero (instruction is a little longer, though.)


10. SIB (Scaled Index Byte) Layout


  • Scaled indexed addressing mode uses the second byte (namely, SIB byte) that follows the MOD-REG-R/M byte in the instruction format.

  • The MOD field still specifies the displacement size of zero, one, or four bytes.

    • The MOD-REG-R/M and SIB bytes are complex, because Intel reused 16-bit addressing circuitry in the 32-bit mode, rather than simply abandoning the 16-bit format in the 32-bit mode.

    • There are good hardware reasons for this, but the end result is a complex scheme for specifying addressing modes in the opcodes.

  • Scaled index byte layout:

      SIB, Scaled index byte layout

      SIB scaled index values SIB index register encoding

      SIB base register encoding


11. Scaled Indexed Addressing Mode

  • [ reg32 + eax*n ] MOD = 00
    [ reg32 + ebx*n ] 
    [ reg32 + ecx*n ]
    [ reg32 + edx*n ]
    [ reg32 + ebp*n ]
    [ reg32 + esi*n ]
    [ reg32 + edi*n ]
    
    [ disp + reg8 + eax*n ] MOD = 01
    [ disp + reg8 + ebx*n ]
    [ disp + reg8 + ecx*n ]
    [ disp + reg8 + edx*n ]
    [ disp + reg8 + ebp*n ]
    [ disp + reg8 + esi*n ]
    [ disp + reg8 + edi*n ]
    
    [ disp + reg32 + eax*n ] MOD = 10
    [ disp + reg32 + ebx*n ]
    [ disp + reg32 + ecx*n ]
    [ disp + reg32 + edx*n ]
    [ disp + reg32 + ebp*n ]
    [ disp + reg32 + esi*n ]
    [ disp + reg32 + edi*n ]
    
    [ disp + eax*n ] MOD = 00, and
    [ disp + ebx*n ] BASE field = 101
    [ disp + ecx*n ]
    [ disp + edx*n ]
    [ disp + ebp*n ]
    [ disp + esi*n ]
    [ disp + edi*n ]
    
  •  

    Note: n = 1, 2, 4, or 8.

  • In each scaled indexed addressing mode the MOD field in MOD-REG-R/M byte specifies the size of the displacement. It can be zero, one, or four bytes:

        MOD R/M  Addressing Mode
        --- ---  --------------------------- 
         00 100  SIB
         01 100  SIB + disp8
         10 100  SIB + disp32
    
  • The Base and Index fields of the SIB byte select the base and index registers, respectively.

  • Note that this addressing mode does not allow the use of the ESP register as an index register. Presumably, Intel left this particular mode undefined to provide the ability to extend the addressing modes in a future version of the CPU.


12. Encoding ADD Instruction Example



13. Encoding ADD CL, AL Instruction



14. Encoding ADD ECX, EAX Instruction



15. Encoding ADD EDX, DISPLACEMENT Instruction



16. Encoding ADD EDI, [EBX] Instruction



17. Encoding ADD EAX, [ ESI + disp8 ] Instruction



18. Encoding ADD EBX, [ EBP + disp32 ] Instruction



19. Encoding ADD EBP, [ disp32 + EAX*1 ] Instruction



20. Encoding ADD ECX, [ EBX + EDI*4 ] Instruction



21. Encoding ADD Immediate Instruction


  • MOD-REG-R/M and SIB bytes have no bit combinations to specify an immediate operand.

  • Instead, x86 uses a entirely different instruction format to specify instruction with an immediate operand.

  • There are three rules that apply:

  • Encoding x86 immediate operands:

      Encoding Immediate Operands

  1. If opcode high-order bit set to 1, then instruction has an immediate constant.

  2. There is no direction bit in the opcode:

  3. The third difference between the ADD-immediate and the standard ADD instruction is the meaning of the REG field in the MOD-REG-R/M byte:


22. Encoding Eight, Sixteen, and Thirty-Two Bit Operands


  • When Intel designed the 8086, one bit in the opcode, s, selected between 8 and 16 bit integer operand sizes.

  • Later, when CPU added 32-bit integers to its architecture on 80386 chip, there was a problem:

    • three encodings were needed to support 8, 16, and 32 bit sizes.

  • Solution was an operand size prefix byte.

  • x86 ADD Opcode:

      x86 ADD Opcode


23. Encoding Sixteen Bit Operands


  • 32-bit programs don't use 16-bit operands that often, but they do need them now and then.

  • To allow for 16-bit operands, Intel added prefix a 32-bit mode instruction with the operand size prefix byte with value 66h.

  • This prefix byte tells the CPU to operand on 16-bit data rather than 32-bit data.

  • x86 instruction format:

      instruction format


24. x86 Instruction Prefix Bytes



25. Alternate Encodings for Instructions



26. x86 Opcode Summary



27. MOD-REG-R/M Byte Summary


  • MOD-REG-R/M byte follows one or two opcode bytes of the instruction

  • It provides addressing mode information for one or two operands.

  • MOD-REG-R/M Byte:

      MOD-REG-R/M Byte


28. ISA Design Considerations



29. ISA Design Challenges



30. Intel Architecture Software Developer's Manual



31. Intel Instruction Set Reference (Volume2)



32. Chapter 3 of Intel Instruction Set Reference



33. Intel Reference Opcode Bytes



34. Intel Reference Opcode Bytes, Cont.



35. Intel Reference Opcode Bytes, Cont.



36. Intel Reference Opcode Bytes, Cont.



37. Intel Reference Opcode Bytes, Cont.



38. Intel Reference Opcode Bytes, Cont.



39. Intel Reference Instruction Column