Language Components of MASM and General-Purpose Registers

CIS-261 Home http://www.c-jump.com/bcc/

Language Components of MASM and General-Purpose Registers

General-Purpose Registers
Typical Uses of General-Purpose Registers
ESP Stack Pointer Register
EIP Instruction Pointer Register
EFLAGS Register
EFLAGS Bit Labels
EFLAGS Individual Bit Flags
Write Fast Code
Language Components of MASM
Data-Related Operators and Directives
Identifiers
Integer Constants
EQU Directive and Symbolic Integer Constants
TYPE, LENGTHOF and SIZEOF Operators
masm_operators.ASM
Operand Addressing Mode Types
Register Operands
Immediate Operands
The OFFSET Operator and LEA Instruction
More about LEA Instruction
Ambiguous moves: PTR directive
OFFSET and PTR Example
ADDR and OFFSET
Direct Memory Operands
Plus, Minus, and Index
Directives BYTE PTR, WORD PTR, DWORD PTR
Pointers
Pointer Types
NEAR Pointers
The TYPEDEF Operator

1. General-Purpose Registers

The EAX, EDX, ECX, EBX, EBP, EDI, and ESI registers are 32-bit general-purpose registers, used for temporary data storage and memory access.
The AX, DX, CX, BX, BP, DI, and SI registers are 16-bit equivalents of the above, they represent the low-order 16 bits of 32-bit registers.
The AH, DH, CH, and BH registers represent the high-order 8 bits of the corresponding registers.

Since the processor accesses registers more quickly than it accesses memory, you can make your programs run faster by keeping the most-frequently used data in registers.

Similarly, AL, DL, CL, and BL represent the low-order 8 bits of the registers.

2. Typical Uses of General-Purpose Registers

Register	Size	Special Uses
EAX	32-bit	Accumulator for operands and results
EBX	32-bit	Base pointer to data in the data segment
ECX	32-bit	Counter for loop operations
EDX	32-bit	I/O pointer
EBP	32-bit	Frame Pointer - useful for stack frames
ESP	32-bit	Stack Pointer - hardcoded into PUSH and POP operations
ESI	32-bit	Source Index - required for some array operations
EDI	32-bit	Destination Index - required for some array operations
EIP	32-bit	Instruction Pointer
EFLAGS	32-bit	Result Flags - hardcoded into conditional operations

3. ESP Stack Pointer Register

The ESP register points to the current location within the stack segment.
Pushing a value onto the stack decreases the value of ESP.
Popping from the stack increases the value of ESP.

4. EIP Instruction Pointer Register

The EIP register always contains the address of the next instruction to be executed.
You cannot directly access or change the instruction pointer.
However, instructions that control program flow, such as calls, jumps, loops, and interrupts, automatically change the instruction pointer.

5. EFLAGS Register

See also: wikipedia article about EFLAGS Register

6. EFLAGS Bit Labels

    0       CF      Carry flag
    2       PF      Parity flag
    4       AF      Auxiliary carry flag
    6       ZF      Zero flag
    7       SF      Sign flag
    8       TF      Trap flag
    9       IF      Interrupt enable flag
    10      DF      Direction flag
    11      OF      Overflow flag
    12-13   IOPL    I/O Priviledge level
    14      NT      Nested task flag
    16      RF      Resume flag
    17      VM      Virtual 8086 mode flag
    18      AC      Alignment check flag (486+)
    19      VIF     Virutal interrupt flag
    20      VIP     Virtual interrupt pending flag
    21      ID      ID flag

7. EFLAGS Individual Bit Flags

Bit	Label	EFLAGS Flag Description
0	CF	Carry Flag: Set by arithmetic instructions which generate either a carry or borrow. Set when an operation generates a carry to or a borrow from a destination operand.
2	PF	Parity flag: Set by most CPU instructions if the least significant (aka the low-order bits) of the destination operand contain an even number of 1's.
4	AF	Auxiliary Carry Flag: Set if there is a carry or borrow involving bit 4 of EAX. Set when a CPU instruction generates a carry to or a borrow from the low-order 4 bits of an operand. This flag is used for binary coded decimal (BCD) arithmetic.
6	ZF	Zero Flag: Set by most instructions if the result an operation is binary zero.
7	SF	Sign Flag: Most operations set this bit the same as the most significant bit (aka high-order bit) of the result. 0 is positive, 1 is negative.
8	TF	Trap Flag: (sometimes named a Trace Flag.) Permits single stepping of programs. After executing a single instruction, the processor generates an internal exception 1. When Trap Flag is set by a program, the processor generates a single-step interrupt after each instruction. A debugging program can use this feature to execute a program one instruction at a time.
9	IF	Interrupt Enable Flag: when set, the processor recognizes external interrupts on the INTR pin. When set, interrupts are recognized and acted on as they are received. The bit can be cleared to turn off interrupt processing temporarily.
10	DF	Direction Flag: Set and cleared using the STD and CLD instructions. It is used in string processing. When set to 1, string operations process down from high addresses to low addresses. If cleared, string operations process up from low addresses to high addresses.
11	OF	Overflow Flag: Most arithmetic instructions set this bit, indicating that the result was too large to fit in the destination. When set, it indicates that the result of an operation is too large or too small to fit in the destination operand.
12-13	IOPL	Input/Output privilege level flags: Used in protected mode to generate four levels of security.
14	NT	Nested Task Flag: Used in protected mode. When set, it indicates that one system task has invoked another via a CALL Instruction, rather than a JMP.
16	RF	Resume Flag: Used by the debug registers DR6 and DR7. It enables you to turn off certain exceptions while debugging code.
17	VM	Virtual 8086 Mode flag: Permits 80386 to behave like a high speed 8086.

Although all flags serve a purpose, most programs require only the carry, zero, sign, and direction flags.

8. Write Fast Code

Whenever possible, use registers rather than constant values, and constant values rather than memory.
Minimize changes in program flow.
Smaller is often better. For example, the instructions
```
   dec  bx
   sub  bx, 1
```
accomplish the same thing and have the same timings on 80386/486 processors, but DEC BX is 3 bytes smaller than the second, and so may reach the processor faster.

9. Language Components of MASM

Reserved words in MASM include:
- Instructions, which correspond to operations the processor can execute.
- Directives, which give commands to the assembler.
- Attributes, which provide a value for a field, such as segment alignment.
- Operators, which are used in expressions.
- Predefined symbols, which return information to your program.
MASM reserved words are not case sensitive except for predefined symbols.
The assembler generates an error if you use a reserved word as a variable, code label, or other identifier.

10. Data-Related Operators and Directives

Operators and directives are not part of the Intel instruction set.
They are only understood by the Assembler (in our case, Microsoft MASM.)
Various assemblers have differing syntaxes for operators and directives...
- ...There is no single-defined assembler standard.
Useful MASM operators and directives:
- The OFFSET operator returns the distance of a variable from the beginning of its enclosing segment.
- The PTR operator lets you override a variable's default size.
- The TYPE operator returns the size (in bytes) of each element in an array.
- The LENGTHOF operator returns the number of elements in an array.
- The SIZEOF operator returns the number of bytes used by an array initializer.

11. Identifiers

An identifier is a name that you invent and attach to a definition.
Identifiers can be symbols representing variables, constants, procedure names, code labels, segment names, and user-defined data types such as structures, unions, records, and types defined with TYPEDEF.
Identifiers longer than 247 characters generate an error.
Certain restrictions limit the names you can use for identifiers. Follow these rules to define a name for an identifier:
- The first character of the identifier can be an alphabetic character (A–Z) or any of these four characters: @ _ $ ?
- The other characters in the identifier can be any of the characters listed above or a decimal digit (0–9).

12. Integer Constants

        mov     ax, 25
        mov     bx, 0B3h

Numerals are followed by an optional radix specifier.
Radix is the number base suffix:
- y for binary (or b if the default radix is not hexadecimal)
- o or q for octal
- t for decimal (or d if the default radix is not hexadecimal)
- h for hexadecimal.
Default radix is decimal, but you can change the default with the .RADIX directive.
Hexadecimal numbers must always start with a decimal digit (0–9). If necessary, add a leading zero.

13. EQU Directive and Symbolic Integer Constants

Two MASM directives,

    symbol EQU expression
    symbol = expression

generate integer constants. For example,

       .CONST
    column  EQU    80                ; Constant    80 
    row     EQU    25                ; Constant    25
    screen  EQU    column * row      ; Constant  2000
    line    EQU    row               ; Constant    25

       .DATA

       .CODE
       mov     cx, column
       mov     bx, line

Using symbolic constants instead of undescriptive numbers makes your code more readable and easier to maintain.
The assembler does not allocate data storage when you use either EQU or = .
The assembler simply replaces each occurrence of the symbol with the value of the expression.
Integers defined with the = directive can be redefined with another value in your source code, but those defined with EQU cannot.

14. TYPE, LENGTHOF and SIZEOF Operators

The TYPE operator returns the size (in bytes) of each element in an array.
The LENGTHOF operator returns the number of elements in an array.
The SIZEOF operator returns the number of bytes used by an array initializer.
See listing file masm_operators.lst of the masm_operators.asm program for operator examples.


TITLE MASM Operators
; CIS-261
; masm_operators.asm
; Demonstration of TYPE, LENGTHOF, SIZEOF operators

.386                ; Tells MASM to use Intel 80386 instruction set.
.MODEL FLAT         ; Flat memory model
option casemap:none ; Treat labels as case-sensitive

.CONST          ; Constant data segment
    byte1    BYTE  10,20,30
    array1   WORD  30 DUP(?),0,0
    array2   WORD  5 DUP(3 DUP(?))
    array3   DWORD 1,2,3,4
    digitStr BYTE  '12345678',0
    myArray  BYTE  10,20,30,40,50,
                   60,70,80,90,100

    ;---------------------------------------------
    ; You can examine the following constant values
    ; by looking in the listing file masm_operators.lst
    ;---------------------------------------------
    X = LENGTHOF byte1     ; 3
    X = LENGTHOF array1    ; 30 + 2
    X = LENGTHOF array2    ; 5 * 3
    X = LENGTHOF array3    ; 4
    X = LENGTHOF digitStr  ; 9
    X = LENGTHOF myArray   ; 10

    X = SIZEOF byte1       ; 1 * 3
    X = SIZEOF array1      ; 2 * (30 + 2)
    X = SIZEOF array2      ; 2 * (5 * 3)
    X = SIZEOF array3      ; 4 * 4
    X = SIZEOF digitStr    ; 1 * 9

    X = TYPE byte1     ; 1
    X = TYPE array1    ; 2
    X = TYPE array2    ; 2
    X = TYPE array3    ; 4
    X = TYPE digitStr  ; 1

.DATA           ; Begin initialized data segment
    
.CODE           ; Begin code segment
_main PROC      ; Beginning of code

    ret
    
_main ENDP
END _main       ; Marks the end of the module and sets the program entry point label

15. Operand Addressing Mode Types

Instructions work on sources of data called operands.
Here are the Operand Types and corresponding Addressing Modes:
- Register - an 8-bit or 16-bit register on the 8086–80486, 32-bit on the 80386/486/Pentium.
- Immediate - a constant value contained in the instruction itself.
- Direct memory - a fixed location in memory.
- Indirect memory - a memory location determined at run time by using the address stored in one or two registers.
Instructions that take two or more operands always work right to left:
- mov destination, source
The right operand is the source operand.

The source specifies data that will be read, but not changed by the intruction.
The left operand is the destination operand.

It specifies the data that will be acted on and possibly changed by the instruction.

16. Register Operands

Register operands refer to data stored in registers. The following examples show typical register operands:

        mov     bx, 10          ; Load constant to BX
        add     ax, bx          ; Add BX to AX
        jmp     di              ; Jump to the address in DI

An offset stored in a base or index register often serves as a pointer into memory.

You can store an offset in one of the base or index registers, then use the register as an indirect memory operand. For example:

        mov     [bx], dl ; Store DL in indirect memory operand
        inc     bx       ; Increment register operand
        mov     [bx], dl ; Store DL in new indirect memory operand

This example moves the value in DL twice to 2 consecutive bytes of a memory location pointed to by BX.
Example shows that changing BX register causes it to point to a different location in memory.

17. Immediate Operands

An immediate operand is a constant value or the result of a constant expression.
The assembler encodes immediate values into the instruction at assembly time.

Here are some typical examples showing immediate operands:

        mov     cx, 20          ; Load constant to register
        add     var, 1Fh        ; Add hex constant to variable
        sub     bx, 25 * 80     ; Subtract constant expression

Immediate data is never permitted in the destination operand.
If the source operand is immediate, the destination operand must be either a register or direct memory to provide a place to store the result of the operation.
Immediate expressions often involve the useful OFFSET operator.

18. The OFFSET Operator and LEA Instruction

An address constant is a special type of immediate operand that consists of an offset or segment value.
The OFFSET operator returns the offset of a memory location relative to the beginning of the segment to which the location belongs:
```
        mov     bx, OFFSET var  ; Load offset address
```
Since data in different modules may belong to a single segment, the assembler cannot know for each module the true offsets within a segment.
Thus, the offset for var, although an immediate value, is not determined until link time.

Instruction

        lea eax, [LocalVar]     ;  Load effective address of LocalVar into eax
        lea eax, LocalVar       ;  This does exactly the same thing

generally equates to

        mov eax, OFFSET LocalVar

but is 1 CPU cycle slower, so mov with OFFSET is preferred in cases other than local variables.

Also, if you need the program to be really small, and LEA already used 100 times, you can make the program 100 bytes smaller simply by using MOV x, OFFSET y instead.
Effective address is the physical address of the data in memory.

The OFFSET operator returns the offset of a data label. The offset represents the distance, in bytes, of the label from the beginning of the data segment. In Protected mode, an offset is always 32 bits long. In Real-address mode, offsets are only 16 bits.

19. More about LEA Instruction

LEA does loading of memory offset value into a register.
Suppose we want to load EBX with the offset value of table1.
We could write
```
        mov    ebx, OFFSET table1
```
OFFSET operator resolves offset at the assembly time.
Another way of loading memory offset is by using the LEA instruction:

LEA resolves offset at run time:

        lea    ebx, [table1]     ;  load effective address of table1 into ebx

The format of LEA instruction is
- LEA register, source
We must to use LEA when the needed offset is available at run time only.
Consider array index, passed as a parameter to a procedure in register ESI, would require

To load EBX with the address of the element of table1, where element index is in ESI,

        lea    ebx, [ table1 + esi ]
        lea    ebx, [ table1 ] + esi   ;  does the same thing
        lea    ebx, table1[ esi ]      ;  does the same thing

NOTE: We cannot use the MOV instruction to do this !

20. Ambiguous moves: PTR directive

Consider fragment of data definitions and code:

        .DATA
        table1 WORD 20 DUP (0)
        status BYTE 7 DUP (1)
        ..
        mov    EBX, OFFSET table1
        mov    ESI, OFFSET status
        mov    [EBX], 100
        mov    [ESI], 100

The last two MOV instructions are ambiguous, since it is not clear whether the assembler should use byte or word equivalent of 100.

WORD PTR and BYTE PTR type specifiers must be used for clarification:

        mov    WORD PTR [EBX], 100
        mov    BYTE PTR [ESI], 100

21. OFFSET and PTR Example

 
 ; offset_ptr.asm
 ; OFFSET and PTR demo program
 .586P
 ; Flat memory model
 .MODEL FLAT, STDCALL
 ;---------------------------------------
 ; Data segment
 _DATA SEGMENT
         num  DWORD   0
 _DATA ENDS
 ; Code segment
 _TEXT SEGMENT
 START:
         lea     ESI, num           ; Load effective address
         mov     ESI, OFFSET num
         mov     bx, WORD PTR num   ; WORD PTR needed because num declared DWORD
         mov     [ESI], bx          ; Copy a word-size value (BX is 16-bit)
         mov     BYTE PTR [ESI], 5  ; Store 8-bit value
         mov     WORD PTR [ESI], 5  ; Store 16-bit value
         mov     DWORD PTR [ESI], 5 ; Store 32-bit value
         ret                        ; Exit
 _TEXT ENDS
 END START

22. ADDR and OFFSET

The OFFSET operator returns the address of a variable. It is used to specify the location rather than the content of the variable:

        .DATA
            MyVar  DB   77h       ;  byte-sized variable called MyVar initialized to 77h
        .CODE
            .
            mov eax, MyVar        ;  copy 77h into eax
            mov ebx, offset MyVar ;  copy memory address where value 77h stored into ebx

OFFSET can also pass the address of a variable to a procedure in an INVOKE statement.
However, OFFSET will only work for global variables declared in the .DATA or .DATA? segments.
OFFSET will fail with local variables, which are declared upon entry into procedure using the LOCAL statement.
Local variables inside a procedure do not have offset, because they are created on the stack at runtime.
The ADDR operator solves this problem. It is used exclusively with INVOKE to pass the address of a variable to a procedure.
For global variables ADDR operator translates to a simple PUSH instruction, just as if OFFSET had been used:
```
        push GlobalVar
```
However, for local variables ADDR translates to: lea eax, LocalVar ; load effective address of LocalVar into eax push eax
Effective address is the physical address of the data in memory.
It is important to remember that when using ADDR with local variables, the EAX register is modified rather than leaving it available for other usages within the calling procedure.

23. Direct Memory Operands

A direct memory operand specifies the data at a given address.
The instruction acts on the contents of the address, not the address itself.
Except when size is implied by another operand, you must specify the size of a direct memory operand so the instruction accesses the correct amount of memory.

The following example shows how to explicitly specify data size with the BYTE directive:

            .DATA?                  ; Segment for uninitialized data
    var     BYTE   ?                ; Reserve one byte, labeled "var"
            .CODE
            .
            .
            .
            mov    var, al          ; Copy AL to byte at var

Any location in memory can be a direct memory operand as long as a size is specified (or implied) and the location is fixed.
The data at the address can change, but the address cannot.
By default, instructions that use direct memory addressing use the DS register.

24. Plus, Minus, and Index

The plus and index operators perform in exactly the same way when applied to direct memory operands.
For example, both the following statements move the second word value from a memory array into the AX register:
```
        mov     ax, array[ 2 ]
        mov     ax, array + 2
```
The index operator can contain any direct memory operand. The following statements are equivalent:
```
        mov     ax, var
        mov     ax, [var]
```
Many programmers prefer to enclose the operand in brackets to show that the contents, not the address, are used.

Memory operands in brackets make it easier to port code over to other assemblers.
The minus operator behaves as you would expect. Both the following instructions retrieve the value located at the word preceding array:
```
        mov     ax, array[-2]
        mov     ax, array-2
```

25. Directives BYTE PTR, WORD PTR, DWORD PTR

There are times when we need to assist assembler in translating references to data in memory.
For example, instruction
```
        mov     [ESI], al  ; Store a byte-size value in memory location pointed by ESI
```
suggests that an 8-bit quantity should be moved because AL is an 8-bit register.

When instruction has no reference to operand size,

        mov     [ESI], 5   ; Error: operand must have the size specified

To get around this instance, we must use a pointer directive, such as

        mov     BYTE PTR [ESI], 5  ; Store 8-bit value
        mov     WORD PTR [ESI], 5  ; Store 16-bit value
        mov     DWORD PTR [ESI], 5 ; Store 32-bit value

These instructions require operands to be the same size.

In general, PTR operator forces expression to be treated as a pointer of specified type:

        .DATA
        num  DWORD   0

        .CODE
        mov     ax, WORD PTR [num] ; Load a word-size value from a DWORD

26. Pointers

A variable that contains the address of another variable is called a pointer variable, or, simply, a pointer.
Pointers are essential when manipulating arrays and other data structures in memory.
High-level language s such C/C++ and Java purposely hide the implementations of pointers,
because such details are not portable across different machine architectures.
However, assembly language programmers deal with pointers at the physical level.

27. Pointer Types

Intel-based programs use two basic types of pointers, NEAR and FAR.
Pointer sizes are affected by the processor's current mode:
- l6-bit Real Mode
- or 32-bit Protected Mode.
  
  (Our discussion about x86 memory modes comes later in this course.)

28. NEAR Pointers

The Protected-mode programs use NEAR pointers.

NEAR pointers are stored as double word variables. For example,

    .DATA               ; Begin data segment
        b_array     BYTE    0, 1, 2, 4, 8
        w_array     WORD    1000h, 2000h, 3000h
        ptr_b_arr   DWORD   OFFSET b_array
        ptr_w_arr   DWORD   OFFSET w_array

    .CODE               ; Begin code segment

        mov esi, ptr_b_arr
        inc esi
        inc esi
        mov al, [esi]   ; AL <- 2
        
        mov esi, ptr_w_arr
        add esi, TYPE w_array
        mov ax, [esi]   ; AX <- 2000h

29. The TYPEDEF Operator

The TYPEDEF operator creates a user-defined type.
A user-defined type has the status of a built-in type when defining variables.
TYPEDEF is ideal for creating pointer variables.

For example, PBYTE and PWORD are pointers to bytes and words, respectively:

    PBYTE TYPEDEF PTR BYTE
    PWORD TYPEDEF PTR WORD

    .DATA               ; Begin data segment
        b_array     BYTE    0, 1, 2, 4, 8
        w_array     WORD    1000h, 2000h, 3000h
        ptr_b_arr   PBYTE   OFFSET b_array
        ptr_w_arr   PWORD   OFFSET w_array

    .CODE               ; Begin code segment

        mov esi, ptr_b_arr
        inc esi
        inc esi
        mov al, [esi]   ; AL <- 2
        
        mov esi, ptr_w_arr
        add esi, TYPE w_array
        mov ax, [esi]   ; AX <- 2000h