CIS-77 Home http://www.c-jump.com/CIS77/CIS77syllabus.htm
High-level language programs are portable.
(Although some programs could still have a few machine-dependent details, they can be used with little or no modifications on other types of machines.)
High-level instructions:
Program development is faster
Fewer lines of code
Program maintenance is easier
Compiler translates to the target machine language.
There are some disadvantages...
Assembly language programs are not portable!
Learning the assembly is more difficult than learning Java!
Programming in the assembly language is a tedious and error-prone process.
High-level languages should be natural preference for common applications.
I just don't consider a utility program that's 4 megabytes big, and contains all sorts of files that the author didn't create, to be really great software.
Do you?
Steve Gibson, Gibson Research Corporation.
Assembly language programs contain only the code that is necessary to perform the given task.
Assembly gives direct and complete control over system hardware:
Writing device drivers.
Operating system design.
Embedded systems programming, e.g. aviation industry.
Writing in-line assembly (mixed-mode) in high-level languages such as C/C++, or hybrid programming in assembly and C/C++.
There are areas where speed is everything, for example, internet data encryption, aircraft navigational systems, medical hardware control...
There are also areas where space-efficiency is everything: spacecraft control software...
Understanding disassembly view of an executable program is also useful:
for investigating the cause of a serious bugs or crashes that require understanding of memory dumps and disassembled code.
for optimizing your code.
for practical and educational purposes.
The "granddaddy" of all assemblers for the Intel platform, product of Microsoft.
Available since the beginning of the IBM-compatible PCs.
Works in MS-DOS and Windows environments.
It's free: Microsoft no longer sells MASM as a standalone product.
Bundled with the Microsoft Visual Studio product.
Numerous tutorials, books, and samples floating around, many are free or low-cost.
Steve Hutchessen's
MASM32 development environment incorporates MASM assembler and Win32 API tools.
Logic gates are used at the hardware level.
What is machine language?
How high-level language concepts, such as if-else statements, are realized at the machine level?
What about interactions with the operating system functions?
How is assembly language translated into machine language?
These fundamental questions apply to most computer architectures.
By using assembly, we gain understanding of how the particular model of computer works.
Such secrets have been revealed to me that all I have written now appears of little value.
St. Thomas Aquinas, December 6, 1273.
Useful links:
MASM Reference Guide can be downloaded there, too.
More here:
Intel and Microsoft MASM 6.1
A web page with a variety of
Intel 80x86 Conditional and Unconditional Branching
Intel 80x86 Boolean and Arithmetic Instruction
You can get Microsoft's Macro Assembler free: download
Take a look at Sivarama P. Dandamudi textbook info,
Last, but not least,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
CPU registers
Memory addressing
Representation of data:
numeric formats
character strings
Instructions to operate on 2's complement integers
Instructions to operate on individual bits
Instructions to handle strings of characters
Instructions for branching and looping
Coding of procedures:
transfer of control
parameter passing
local variables
The tools we will use include:
Visual Studio development environment...
...edit, assemble, link, manage projects, debug and disassemble programs.
Command-line MASM, Microsoft Macro Assembler...
...produces code for 32-bit flat memory model appropriate to modern Windows.
Test-drive fullscreen 32-bit debuggers: OllyDbg, Visual Studio, WinDbg.
DUMPBIN: command-line utility that examines binary files and disassembles programs.
Program runs on the processor.
Program uses operating system functions and services.
Program uses one of the memory models:
Real mode flat model, 65,536 bytes of addressable memory (ancient MS-DOS .COM files)
Real mode segmented model, 1 megabyte (prime-time MS-DOS)
Protected mode flat model, modern Windows and Linux:
Addressable Memory: 80486 and Pentium - 4 Gigabytes
As far as 32-bit Vista is concerned, the world ends at 4,096 megabytes.
A 32-bit program can address up to 4 gigabytes of memory.
; CIS-77 ; your_program_name.asm ; Brief description of what the program does .386 ; Tells MASM to use Intel 80386 instruction set. .MODEL FLAT ; Flat memory model option casemap:none ; Treat labels as case-sensitive .CONST ; Constant data segment .STACK 100h ; (default is 1-kilobyte stack) .DATA ; Begin initialized data segment .CODE ; Begin code segment _main PROC ; Beginning of code ret _main ENDP END _main ; Marks the end of the module and sets the program entry point label
Some simple high-level language instructions can be expressed by a single assembly instruction:
Assembly Code C Language Code ---------------- --------------------------------- inc result ++result; // Increment value mov size, 1024 size = 1024; // Assign value and var, 128 var &= 128; // Apply AND bitmask add value, 10 value += 10; // Addition
Most high-level language instructions need more than one assembly instruction:
Assembly Code C Language Code ---------------- --------------------------------- mov AX, value size = value; // Assign variable mov size, AX mov AX, sum sum += x + y + z; // Arithmetic computation add AX, x add AX, y add AX, z mov sum, AX
Assembly Language uses mnemonics, digital numbers, comments, etc.
Machine Language instructions are just a sequences of 1s and 0s.
Readability of assembly language instructions is much better than the machine language instructions:
Assembly Language Machine Language (in Hex) ----------------- -------------------------- inc result FF060A00 mov size, 45 C7060C002D00 and var, 128 80260E0080 add value, 10 83060F000A
Just as in high-level language, you want to control program flow.
The JMP instruction transfers control unconditionally to another instruction.
JMP corresponds to goto statements in high-level languages:
; Handle one case label1: . . . jmp done ; Handle second case label2: . . . jmp done . . done:
Conditional jump is taken only if the condition is met.
Condition testing is separated from branching.
Flag register is used to convey the condition test result.
For example:
cmp ax, bx je done . . done:
|
|
Similarly, AL, DL, CL, and BL represent the low-order 8 bits of the registers.
Register | Size | Typical Uses |
---|---|---|
EAX | 32-bit | Accumulator for operands and results |
EBX | 32-bit | Base pointer to data in the data segment |
ECX | 32-bit | Counter for loop operations |
EDX | 32-bit | Data pointer and I/O pointer |
EBP | 32-bit | Frame Pointer - useful for stack frames |
ESP | 32-bit | Stack Pointer - hardcoded into PUSH and POP operations |
ESI | 32-bit | Source Index - required for some array operations |
EDI | 32-bit | Destination Index - required for some array operations |
EIP | 32-bit | Instruction Pointer |
EFLAGS | 32-bit | Result Flags - hardcoded into conditional operations |
Four 32-bit registers can be used as
Four 32-bit registers EAX, EBX, ECX, EDX.
Four 16-bit registers AX, BX, CX, DX.
Eight 8-bit register AH, AL, BH, BL, CH, CL, DH, DL.
Some registers have special use...
...ECX for count in LOOP and REPeatable instructions
|
|
EIP Program counter (Instruction Pointer)
EFLAGS is set of bit flags:
Status flags record status information about the result of the last arithmetic/logical instruction.
Direction flag stores forward/backward direction for data copying.
System flags store
IF interrupt-enable mode
TF Trap flag used in single-step debugging.
The MOV instruction copies the source operand to the destination operand without affecting the source.
Five types of operand combinations are allowed with MOV:
Instruction type Example -------------------------- ------------------ mov register, register mov DX, CX mov register, immediate mov BL, 100 mov register, memory mov EBX, [count] mov memory, register mov [count], ESI mov memory, immediate mov [count], 23
Note: the above operand combinations are valid for all instructions that require two operands.
For the following data definitions
.DATA table1 DW 20 DUP (?) status DB 7 DUP (0) .CODE mov EBX, table1 ; "instruction operands must be the same size" mov ESI, status ; "instruction operands must be the same size" mov [EBX], 100 ; "invalid instruction operands" mov [ESI], 100 ; "invalid instruction operands"
The above MOV instructions are ambiguous.
Not clear whether the assembler should use byte or word equivalent of 100.
Better:
mov EBX, OFFSET table1 mov ESI, OFFSET status mov WORD PTR [EBX], 100 mov BYTE PTR [ESI], 100
Format:
inc destination dec destination
Semantics:
destination = destination +/- 1
The destination can be 8-bit, 16-bit, or 32-bit operand, in memory or in register.
No immediate operand is allowed.
Examples:
inc BX ; BX = BX + 1 dec [value] ; value = value - 1
Format:
add destination, source
Semantics:
destination = (destination) + (source)
Examples:
add ebx,eax add [value], 10h
Note that
inc eax
is better than
add eax, 1
INC takes less space.
Both INC and ADD execute at about the same speed.
Format:
sub destination, source
Semantics:
destination = (destination) - (source)
Examples:
sub ebx, eax sub [value], 10h
Note that
dec eax
is better than
sub eax, 1
DEC takes less space.
Both execute at about the same speed.
Format:
cmp destination, source
Semantics:
(destination) - (source)
The destination and source are not altered.
Useful to test relationship such as < > or = between the two operands.
Used in conjunction with conditional jump instructions for decision making purposes.
Examples:
cmp ebx, eax je done ; jump if equal .. done: ..
Format:
jmp label
Semantics:
Execution is transferred to the instruction identified by the label.
Infinite loop example:
mov eax, 1 inc_again: inc eax jmp inc_again mov ebx, eax ; this will never execute...
Format:
jcondition label
Semantics:
Execution is transferred to the instruction identified by label only if condition is met.
Testing for carriage return example:
; Assume that AL contains input character. cmp al, 0dh ; 0dh = ASCII carriage return je CR_received inc cl .. CR_received:
Some conditional jump instructions treat operands of the CMP instruction as signed numbers:
je jump if equal jg jump if greater jl jump if less jge jump if greater or equal jle jump if less or equal jne jump if not equal
Some conditional jump instructions can also test values of the individual CPU flags:
jz jump if zero (ZF = 1) jnz jump if not zero (ZF = 0) jc jump if carry (CF = 1) jnc jump if not carry (CF = 0) jz is synonymous for je jnz is synonymous for jne
Format:
loop target
Semantics:
Decrements ECX and jumps to target, if ECX > 0
ECX should be loaded with a loop count value before loop begins.
|
|
|
Format:
and destination, source or destination, source xor destination, source not destination
Semantics:
Perform the standard bitwise logical operations.
Result goes to the destination.
TEST is a non-destructive AND instruction:
test destination, source
TEST performs logical AND but the result is not stored in destination (similar to CMP instruction.)
Example of testing the value in AL for odd/even number:
test al, 01h ; test the least significant bit je even_number odd_number: ; process odd number .. jmp next even_number: ; process even number .. next:
Shift left format:
shl destination, count shl destination, cl
Shift right format:
shr destination, count shr destination, cl
where count is an immediate value.
Semantics:
Performs left/right bit-shift of destination by the value in count or CL register.
CL register contents is not altered.
Bit shifted out goes into the carry flag CF.
Zero bit is shifted in at the other end:
Count is an immediate value:
shl eax, 5
Specification of count greater than 31 is not allowed.
If greater, only the least significant 5 bits are actually used.
CL version of shift is useful if shift count is known at run time,
e.g. when the shift count is a parameter in a procedure call.
Only CL register can be used.
Shift count value should be loaded into CL:
mov cl, 5 shl ax, cl
Two types of rotate instructions:
Rotate without carry:
ROL (ROtate Left)
ROR (ROtate Right)
Rotate with carry:
RCL (Rotate through Carry Left)
RCR (Rotate through Carry Right)
Rotate instruction operand is similar to shift instructions and supports two versions:
Immediate count value
Count value is in CL register
EQU directive eliminates hardcoding:
NUM_OF_STUDENTS EQU 90
..
mov ecx, NUM_OF_STUDENTS
No reassignment is allowed.
Only numeric constants are allowed.
Defining constants has two main advantages:
Improves program readability
Helps in software maintenance.
mov ecx, 90 ; HARDCODING is less readable and harder to maintain
Multiple occurrences can be changed from a single place
The convention is to use all UPPER-CASE LETTERS for names of constants.
name EQU expression
Assigns the result of expression to name.
The expression is evaluated at assembly time.
More examples:
NUM_OF_ROWS EQU 50 NUM_OF_COLS EQU 10 ARRAY_SIZE EQU NUM_OF_ROWS * NUM_OF_COLS