Previous section overview of "Hello, World!" program and creation of VC++ project made it possible to compile our first program from command line, as well as using Visual Studio IDE (integrated development environment). In both cases creation of the executable file on disk could be illustrated by the following diagram: |
Since preprocessing, compiling, and linking are commonly viewed as steps of a single attempt to create an executable file, they can be collectively referred to as "building" process. Usually, if one of the steps fails, the building stops, with error descriptions printed either to the command window, or to the output window of the IDE. The following sequence diagram demonstrates typical successful events circulating between components of the build in the order in which they occur:
|
A file is the traditional unit of storage in the file system. Similarly, file is also the traditional unit of program compilation.
Having a complete program in one file is usually impossible. The code of C++ standard
libraries or library of the operating system-specific programs are typically supplied
by
|
Compiler is a translator program that converts high-level language program into machine language. The CPU (central processing unit) can directly understand its own machine language. The typical outline goes like this:
|
The linker is the program that binds together all separately compiled parts. A linker links the object code with the code for the missing functions to produce an executable image, and creates executable file on disk. C++ and operating system object libraries supply the missing functions. The program now can be loaded into memory and executed. |
Once program is ready it can be run from the command window. The loader component of the operating system takes the executable image from disk and transfers it to memory for execution:
Our program is loaded alongside of other user programs in memory as follows: |
The operating system occupies the layer between user programs and the central processing unit, or CPU for short.
|
Other layers may exist between the user programs, the operating system and the CPU. Typical layers are user-mode and kernel-mode debuggers:
|
User-mode debugger is a type of process that tells the CPU to execute processor instructions in a single step mode. The debugger loads the user program in a separate process. In this setting, debugger is a parent, and the user program is a child process. Debugger can execute user program one statement at a time, helping programmers to detect and investigate "bugs". Debuggers are valuable run-time tools to locate and remove logic errors. Microsoft VC++ has its own debugger with graphical user interface. Kernel-mode debuggers (KDs) are similar to the user-mode debuggers, but when KD stops execution, entire operating system stops. KD debuggers often support an interface that allows the programmer to remotely control its functions from another computer, connected via a null-modem cable. KD debuggers allow developers to detect bugs in components of the operating system, such as the device drivers. |
|
Although the CPU executes individual program instructions, the operating system has a mechanism to interrupt the execution of a user program and give the execution time slice to another program. This type of task scheduling creates an impression to the user that all programs are running in parallel. |
Computer memory is composed of storage cells sometimes referred to as words. Each cell has a unique numeric address associated with it, which identifies its location in the memory. The number of memory cells in a computer varies but is usually measured in millions of cells. Each memory cell contains a binary number made up of a series of binary digits or bits, usually 8, 16, 32 or 64. A binary digit has two possible values 0 or 1. A binary number is therefore comprised of a sequence of 0s and 1s. The digits contained in each cell are represented by voltage levels with peaks representing 1s and troughs 0s.
A group of 8 bits is called a byte. The size of memory is usually expressed in the unit kilobyte, shortened to Kb, which is 1024 (210) bytes. Storage capacity can also be expressed as a number of megabytes (Mb) approximately a million (220 = 1048576) bytes, or gigabytes (Gb) approximately a (US) billion (230 = 1073741824) bytes. Binary numbers are usually represented using hexadecimal notation, which are the numbers to the base 16, as opposed to the base 2 used by the binary number system, or to the base 10 used in the general purpose decimal system. The hexadecimal system comprises 16 distinct digits. Hexadecimal digits 0 to 9 are the same as those for a decimal number. To represent the 6 extra digits the letters A through to F are used; these are equivalent to the decimal numbers 10 to 15:
A binary number can be converted to a hexadecimal number by partitioning the binary number into groups of four bits and evaluating each group as a hexadecimal digit. For example, the binary number 0100110111010000 would be split into 0100 1101 1101 0000 4 D D 0 which would evaluate to the number 4DD0 in hexadecimal form. TIP: programmers often use a desktop calculator to convert large numbers to different bases: |
|||||||||||||||||||||||||||||||||||||||||||||||||||
Note that each memory address is represented by a unique number - in practice this is usually a hexadecimal number. The lower end of the memory addresses range is often referred to as low memory (the other end is high memory). The operating system is normally located in low memory.
|
Once in memory, the program is configured to have access to separate logical segments of the memory as follows:
|
The latest diagram zooms in on the Intel CPU and program in memory. The CPU has its own built-in memory: the registers. Instruction pointer always points to the next instruction that processor is about to execute. The instruction gets loaded from memory into the instruction register. General-purpose registers hold instruction operands, results of calculation, and also point to the temporary data storage in memory, that is, contain addresses of the memory reserved for program stack. Program stack provides memory for temporary data, such as temporary variables. General-purpose registers have layout that provides access to individual bytes. For example, |
Processor instructions can have one of the following formats: Instruction; -or- Instruction Operand; -or- Instruction Operand, Operand;
For example, the following little program uses C++ assembler keyword void main() { __asm nop __asm push EAX __asm mov EAX, 5 __asm pop EAX } Here,
The mov instruction is the most common instruction used on the CPU because it's the way to move values from one place to another: mov destination, source The source operand of the mov instruction can be:
The destination operand can be a register or a memory reference. The Intel CPUs don't allow both a source and a destination to be memory references:
|
Memory reference indicates that instruction operand is located in memory. Memory reference requires a memory address that is specified inside square brackets. For simplicity, we will discuss memory references that use a general-purpose register (such as EAX, EBX, ECX, or EDX) to specify the address. For example, instruction
mov moves the value from register EBX to the memory location specified by the address stored in register EAX. The lea instruction, whose abbreviation stands for load effective address, loads the destination register with the address of the source operand. For example, lea |
Consider the following example. Since there are no data variables, we added a label that gives a "named address" to a particular location in our code: void main() { start: __asm push EAX ; save EAX data __asm lea EAX, start ; load address of the program label into EAX __asm mov [EAX], EBX ; move garbage from EBX to memory location [EAX] __asm pop EAX ; restore EAX data } If we compile and run this program, it will generate an error: First-chance exception at 0x00411a35 in main.exe: 0xC0000005: Access violation writing location 0x00411a2e. Indeed, user-mode programs are not allowed to modify executable code in memory. |
In order to further demonstrate the CPU access to memory, we could modify our program to have a global variable named x: #include <iostream> int x; void main() { __asm push EAX __asm push EBX __asm mov EBX, 5 ; store integer number 5 in register EBX __asm lea EAX, x ; load address of variable x into register EAX __asm mov [EAX], EBX ; move data from EBX to memory location [EAX] __asm pop EBX __asm pop EAX std::cout << "x is equal to "; std::cout << x; } This time, program compiles and runs fine. If you run it in debug mode, you should be able to see how x changes from zero to five. |
Our examples demonstrated that computer memory is a large array of memory locations. Each location has a unique address. A program loads into memory in three logical segments: code, data, and stack; each of them having their own address space. Address space that does not belong to our program is off limits to us: an attempt to use that memory will result in error known as access violation exception. Our own program code is read-only to our program. Each compiled program contains a sequence of processor instructions. An instruction is the smallest command that the processor can execute at one time. Instructions vary in size and can have one, two, or no operands. Registers are a few of the most important computer resources: they are the fastest type of memory directly referenced inside CPU instructions. Registers are used heavily when the data is moved to and from memory locations. C++ allows direct inserts of the assembly code into programs. The compiler translates in-line assembler commands into CPU instructions. In many cases, when your program crashes, the real difference between solving the bug and screaming in frustration comes down to how well you can read a little assembly language! |