CIS-77 Home http://www.c-jump.com/CIS77/CIS77syllabus.htm
|
|
Similarly, AL, DL, CL, and BL represent the low-order 8 bits of the registers.
Register | Size | Special Uses |
---|---|---|
EAX | 32-bit | Accumulator for operands and results |
EBX | 32-bit | Base pointer to data in the data segment |
ECX | 32-bit | Counter for loop operations |
EDX | 32-bit | I/O pointer |
EBP | 32-bit | Frame Pointer - useful for stack frames |
ESP | 32-bit | Stack Pointer - hardcoded into PUSH and POP operations |
ESI | 32-bit | Source Index - required for some array operations |
EDI | 32-bit | Destination Index - required for some array operations |
EIP | 32-bit | Instruction Pointer |
EFLAGS | 32-bit | Result Flags - hardcoded into conditional operations |
The ESP register points to the current location within the stack segment.
Pushing a value onto the stack decreases the value of ESP.
Popping from the stack increases the value of ESP.
The EIP register always contains the address of the next instruction to be executed.
You cannot directly access or change the instruction pointer.
However, instructions that control program flow, such as calls, jumps, loops, and interrupts, automatically change the instruction pointer.
See also: wikipedia article about
0 CF Carry flag 2 PF Parity flag 4 AF Auxiliary carry flag 6 ZF Zero flag 7 SF Sign flag 8 TF Trap flag 9 IF Interrupt enable flag 10 DF Direction flag 11 OF Overflow flag 12-13 IOPL I/O Priviledge level 14 NT Nested task flag 16 RF Resume flag 17 VM Virtual 8086 mode flag 18 AC Alignment check flag (486+) 19 VIF Virutal interrupt flag 20 VIP Virtual interrupt pending flag 21 ID ID flag
Bit | Label | EFLAGS Flag Description |
---|---|---|
0 | CF | Carry Flag: Set by arithmetic instructions which generate either a carry or borrow. Set when an operation generates a carry to or a borrow from a destination operand. |
2 | PF | Parity flag: Set by most CPU instructions if the least significant (aka the low-order bits) of the destination operand contain an even number of 1's. |
4 | AF | Auxiliary Carry Flag: Set if there is a carry or borrow involving bit 4 of EAX. Set when a CPU instruction generates a carry to or a borrow from the low-order 4 bits of an operand. This flag is used for binary coded decimal (BCD) arithmetic. |
6 | ZF | Zero Flag: Set by most instructions if the result an operation is binary zero. |
7 | SF | Sign Flag: Most operations set this bit the same as the most significant bit (aka high-order bit) of the result. 0 is positive, 1 is negative. |
8 | TF | Trap Flag: (sometimes named a Trace Flag.) Permits single stepping of programs. After executing a single instruction, the processor generates an internal exception 1. When Trap Flag is set by a program, the processor generates a single-step interrupt after each instruction. A debugging program can use this feature to execute a program one instruction at a time. |
9 | IF | Interrupt Enable Flag: when set, the processor recognizes external interrupts on the INTR pin. When set, interrupts are recognized and acted on as they are received. The bit can be cleared to turn off interrupt processing temporarily. |
10 | DF | Direction Flag: Set and cleared using the STD and CLD instructions. It is used in string processing. When set to 1, string operations process down from high addresses to low addresses. If cleared, string operations process up from low addresses to high addresses. |
11 | OF | Overflow Flag: Most arithmetic instructions set this bit, indicating that the result was too large to fit in the destination. When set, it indicates that the result of an operation is too large or too small to fit in the destination operand. |
12-13 | IOPL | Input/Output privilege level flags: Used in protected mode to generate four levels of security. |
14 | NT | Nested Task Flag: Used in protected mode. When set, it indicates that one system task has invoked another via a CALL Instruction, rather than a JMP. |
16 | RF | Resume Flag: Used by the debug registers DR6 and DR7. It enables you to turn off certain exceptions while debugging code. |
17 | VM | Virtual 8086 Mode flag: Permits 80386 to behave like a high speed 8086. |
Although all flags serve a purpose, most programs require only the carry, zero, sign, and direction flags.
Whenever possible, use registers rather than constant values, and constant values rather than memory.
Minimize changes in program flow.
Smaller is often better. For example, the instructions
dec bx sub bx, 1
accomplish the same thing and have the same timings on 80386/486 processors, but DEC BX is 3 bytes smaller than the second, and so may reach the processor faster.
Reserved words in MASM include:
Instructions, which correspond to operations the processor can execute.
Directives, which give commands to the assembler.
Attributes, which provide a value for a field, such as segment alignment.
Operators, which are used in expressions.
Predefined symbols, which return information to your program.
MASM reserved words are not case sensitive except for predefined symbols.
The assembler generates an error if you use a reserved word as a variable, code label, or other identifier.
Operators and directives are not part of the Intel instruction set.
They are only understood by the Assembler (in our case, Microsoft MASM.)
Various assemblers have differing syntaxes for operators and directives...
...There is no single-defined assembler standard.
Useful MASM operators and directives:
The OFFSET operator returns the distance of a variable from the beginning of its enclosing segment.
The PTR operator lets you override a variable's default size.
The TYPE operator returns the size (in bytes) of each element in an array.
The LENGTHOF operator returns the number of elements in an array.
The SIZEOF operator returns the number of bytes used by an array initializer.
An identifier is a name that you invent and attach to a definition.
Identifiers can be symbols representing variables, constants, procedure names, code labels, segment names, and user-defined data types such as structures, unions, records, and types defined with TYPEDEF.
Identifiers longer than 247 characters generate an error.
Certain restrictions limit the names you can use for identifiers. Follow these rules to define a name for an identifier:
The first character of the identifier can be an alphabetic character (AZ) or any of these four characters: @ _ $ ?
The other characters in the identifier can be any of the characters listed above or a decimal digit (09).
mov ax, 25 mov bx, 0B3h
Numerals are followed by an optional radix specifier.
Radix is the number base suffix:
y for binary (or b if the default radix is not hexadecimal)
o or q for octal
t for decimal (or d if the default radix is not hexadecimal)
h for hexadecimal.
Default radix is decimal, but you can change the default with the .RADIX directive.
Hexadecimal numbers must always start with a decimal digit (09). If necessary, add a leading zero.
Two MASM directives,
symbol EQU expression symbol = expression
generate integer constants. For example,
.CONST column EQU 80 ; Constant 80 row EQU 25 ; Constant 25 screen EQU column * row ; Constant 2000 line EQU row ; Constant 25 .DATA .CODE mov cx, column mov bx, line
Using symbolic constants instead of undescriptive numbers makes your code more readable and easier to maintain.
The assembler does not allocate data storage when you use either EQU or = .
The assembler simply replaces each occurrence of the symbol with the value of the expression.
Integers defined with the = directive can be redefined with another value in your source code, but those defined with EQU cannot.
The TYPE operator returns the size (in bytes) of each element in an array.
The LENGTHOF operator returns the number of elements in an array.
The SIZEOF operator returns the number of bytes used by an array initializer.
See listing file masm_operators.lst of the masm_operators.asm program for operator examples.
TITLE MASM Operators ; CIS-77 ; masm_operators.asm ; Demonstration of TYPE, LENGTHOF, SIZEOF operators .386 ; Tells MASM to use Intel 80386 instruction set. .MODEL FLAT ; Flat memory model option casemap:none ; Treat labels as case-sensitive .CONST ; Constant data segment byte1 BYTE 10,20,30 array1 WORD 30 DUP(?),0,0 array2 WORD 5 DUP(3 DUP(?)) array3 DWORD 1,2,3,4 digitStr BYTE '12345678',0 myArray BYTE 10,20,30,40,50, 60,70,80,90,100 ;--------------------------------------------- ; You can examine the following constant values ; by looking in the listing file masm_operators.lst ;--------------------------------------------- X = LENGTHOF byte1 ; 3 X = LENGTHOF array1 ; 30 + 2 X = LENGTHOF array2 ; 5 * 3 X = LENGTHOF array3 ; 4 X = LENGTHOF digitStr ; 9 X = LENGTHOF myArray ; 10 X = SIZEOF byte1 ; 1 * 3 X = SIZEOF array1 ; 2 * (30 + 2) X = SIZEOF array2 ; 2 * (5 * 3) X = SIZEOF array3 ; 4 * 4 X = SIZEOF digitStr ; 1 * 9 X = TYPE byte1 ; 1 X = TYPE array1 ; 2 X = TYPE array2 ; 2 X = TYPE array3 ; 4 X = TYPE digitStr ; 1 .DATA ; Begin initialized data segment .CODE ; Begin code segment _main PROC ; Beginning of code ret _main ENDP END _main ; Marks the end of the module and sets the program entry point label
Instructions work on sources of data called operands.
Here are the Operand Types and corresponding Addressing Modes:
Register - an 8-bit or 16-bit register on the 808680486, 32-bit on the 80386/486/Pentium.
Immediate - a constant value contained in the instruction itself.
Direct memory - a fixed location in memory.
Indirect memory - a memory location determined at run time by using the address stored in one or two registers.
Instructions that take two or more operands always work right to left:
mov destination, source
The right operand is the source operand.
The source specifies data that will be read, but not changed by the intruction.
The left operand is the destination operand.
It specifies the data that will be acted on and possibly changed by the instruction.
|
|
An immediate operand is a constant value or the result of a constant expression.
The assembler encodes immediate values into the instruction at assembly time.
Here are some typical examples showing immediate operands:
mov cx, 20 ; Load constant to register add var, 1Fh ; Add hex constant to variable sub bx, 25 * 80 ; Subtract constant expression
Immediate data is never permitted in the destination operand.
If the source operand is immediate, the destination operand must be either a register or direct memory to provide a place to store the result of the operation.
Immediate expressions often involve the useful OFFSET operator.
An address constant is a special type of immediate operand that consists of an offset or segment value.
The OFFSET operator returns the offset of a memory location relative to the beginning of the segment to which the location belongs:
mov bx, OFFSET var ; Load offset address
Since data in different modules may belong to a single segment, the assembler cannot know for each module the true offsets within a segment.
Thus, the offset for var, although an immediate value, is not determined until link time.
Instruction
lea eax, [LocalVar] ; Load effective address of LocalVar into eax lea eax, LocalVar ; This does exactly the same thing
generally equates to
mov eax, OFFSET LocalVar
but is 1 CPU cycle slower, so mov with OFFSET is preferred in cases other than local variables.
Also, if you need the program to be really small, and LEA already used 100 times, you can make the program 100 bytes smaller simply by using MOV x, OFFSET y instead.
Effective address is the physical address of the data in memory.
The OFFSET operator returns the offset of a data label. The offset represents the distance, in bytes, of the label from the beginning of the data segment. In Protected mode, an offset is always 32 bits long. In Real-address mode, offsets are only 16 bits.
LEA does loading of memory offset value into a register.
Suppose we want to load EBX with the offset value of table1.
We could write
mov ebx, OFFSET table1
OFFSET operator resolves offset at the assembly time.
Another way of loading memory offset is by using the LEA instruction:
LEA resolves offset at run time:
lea ebx, [table1] ; load effective address of table1 into ebx
The format of LEA instruction is
LEA register, source
We must to use LEA when the needed offset is available at run time only.
Consider array index, passed as a parameter to a procedure in register ESI, would require
To load EBX with the address of the element of table1, where element index is in ESI,
lea ebx, [ table1 + esi ] lea ebx, [ table1 ] + esi ; does the same thing lea ebx, table1[ esi ] ; does the same thing
NOTE: We cannot use the MOV instruction to do this !
Consider fragment of data definitions and code:
.DATA table1 WORD 20 DUP (0) status BYTE 7 DUP (1) .. mov EBX, OFFSET table1 mov ESI, OFFSET status mov [EBX], 100 mov [ESI], 100
The last two MOV instructions are ambiguous, since it is not clear whether the assembler should use byte or word equivalent of 100.
WORD PTR and BYTE PTR type specifiers must be used for clarification:
mov WORD PTR [EBX], 100 mov BYTE PTR [ESI], 100
; offset_ptr.asm ; OFFSET and PTR demo program .586P ; Flat memory model .MODEL FLAT, STDCALL ;--------------------------------------- ; Data segment _DATA SEGMENT num DWORD 0 _DATA ENDS ; Code segment _TEXT SEGMENT START: lea ESI, num ; Load effective address mov ESI, OFFSET num mov bx, WORD PTR num ; WORD PTR needed because num declared DWORD mov [ESI], bx ; Copy a word-size value (BX is 16-bit) mov BYTE PTR [ESI], 5 ; Store 8-bit value mov WORD PTR [ESI], 5 ; Store 16-bit value mov DWORD PTR [ESI], 5 ; Store 32-bit value ret ; Exit _TEXT ENDS END START
The OFFSET operator returns the address of a variable. It is used to specify the location rather than the content of the variable:
.DATA MyVar DB 77h ; byte-sized variable called MyVar initialized to 77h .CODE . mov eax, MyVar ; copy 77h into eax mov ebx, offset MyVar ; copy memory address where value 77h stored into ebx
OFFSET can also pass the address of a variable to a procedure in an INVOKE statement.
However, OFFSET will only work for global variables declared in the .DATA or .DATA? segments.
OFFSET will fail with local variables, which are declared upon entry into procedure using the LOCAL statement.
Local variables inside a procedure do not have offset, because they are created on the stack at runtime.
The ADDR operator solves this problem. It is used exclusively with INVOKE to pass the address of a variable to a procedure.
For global variables ADDR operator translates to a simple PUSH instruction, just as if OFFSET had been used:
push GlobalVar
However, for local variables ADDR translates to: lea eax, LocalVar ; load effective address of LocalVar into eax push eax
Effective address is the physical address of the data in memory.
It is important to remember that when using ADDR with local variables, the EAX register is modified rather than leaving it available for other usages within the calling procedure.
A direct memory operand specifies the data at a given address.
The instruction acts on the contents of the address, not the address itself.
Except when size is implied by another operand, you must specify the size of a direct memory operand so the instruction accesses the correct amount of memory.
The following example shows how to explicitly specify data size with the BYTE directive:
.DATA? ; Segment for uninitialized data var BYTE ? ; Reserve one byte, labeled "var" .CODE . . . mov var, al ; Copy AL to byte at var
Any location in memory can be a direct memory operand as long as a size is specified (or implied) and the location is fixed.
The data at the address can change, but the address cannot.
By default, instructions that use direct memory addressing use the DS register.
The plus and index operators perform in exactly the same way when applied to direct memory operands.
For example, both the following statements move the second word value from a memory array into the AX register:
mov ax, array[ 2 ] mov ax, array + 2
The index operator can contain any direct memory operand. The following statements are equivalent:
mov ax, var mov ax, [var]
Many programmers prefer to enclose the operand in brackets to show that the contents, not the address, are used.
Memory operands in brackets make it easier to port code over to other assemblers.
The minus operator behaves as you would expect. Both the following instructions retrieve the value located at the word preceding array:
mov ax, array[-2] mov ax, array-2
There are times when we need to assist assembler in translating references to data in memory.
For example, instruction
mov [ESI], al ; Store a byte-size value in memory location pointed by ESI
suggests that an 8-bit quantity should be moved because AL is an 8-bit register.
When instruction has no reference to operand size,
mov [ESI], 5 ; Error: operand must have the size specified
To get around this instance, we must use a pointer directive, such as
mov BYTE PTR [ESI], 5 ; Store 8-bit value mov WORD PTR [ESI], 5 ; Store 16-bit value mov DWORD PTR [ESI], 5 ; Store 32-bit value
These instructions require operands to be the same size.
In general, PTR operator forces expression to be treated as a pointer of specified type:
.DATA num DWORD 0 .CODE mov ax, WORD PTR [num] ; Load a word-size value from a DWORD
A variable that contains the address of another variable is called a pointer variable, or, simply, a pointer.
Pointers are essential when manipulating arrays and other data structures in memory.
High-level language s such C/C++ and Java purposely hide the implementations of pointers,
because such details are not portable across different machine architectures.
However, assembly language programmers deal with pointers at the physical level.
Intel-based programs use two basic types of pointers, NEAR and FAR.
Pointer sizes are affected by the processor's current mode:
l6-bit Real Mode
or 32-bit Protected Mode.
(Our discussion about x86 memory modes comes later in this course.)
The Protected-mode programs use NEAR pointers.
NEAR pointers are stored as double word variables. For example,
.DATA ; Begin data segment b_array BYTE 0, 1, 2, 4, 8 w_array WORD 1000h, 2000h, 3000h ptr_b_arr DWORD OFFSET b_array ptr_w_arr DWORD OFFSET w_array .CODE ; Begin code segment mov esi, ptr_b_arr inc esi inc esi mov al, [esi] ; AL <- 2 mov esi, ptr_w_arr add esi, TYPE w_array mov ax, [esi] ; AX <- 2000h
The TYPEDEF operator creates a user-defined type.
A user-defined type has the status of a built-in type when defining variables.
TYPEDEF is ideal for creating pointer variables.
For example, PBYTE and PWORD are pointers to bytes and words, respectively:
PBYTE TYPEDEF PTR BYTE PWORD TYPEDEF PTR WORD .DATA ; Begin data segment b_array BYTE 0, 1, 2, 4, 8 w_array WORD 1000h, 2000h, 3000h ptr_b_arr PBYTE OFFSET b_array ptr_w_arr PWORD OFFSET w_array .CODE ; Begin code segment mov esi, ptr_b_arr inc esi inc esi mov al, [esi] ; AL <- 2 mov esi, ptr_w_arr add esi, TYPE w_array mov ax, [esi] ; AX <- 2000h