Course list http://www.c-jump.com/bcc/
Integers are stored precisely. If the size of an integer is larger than the space allowed, an error will occur (the compiler may not tell you - you will just get strange output.)
If you are working with numbers that have a fractional component (or if in a rare occasion we are using integers that don't fit a long data type) then we can use floating point numbers.
For floating point numbers, if the number of significant digits is larger than the number of bits available, the number is simply rounded to the closest value that can be represented.
This means that number may not be stored exactly. However, this loss of precision is often not a problem. We refer to the error as roundoff-error.
Floating point numbers (i.e. numbers with a decimal point) are represented using what is called the IEEE format. C++ float data type stores a single precision number in 32 bits as follows:
The IEEE standard designates bits to compute the floating point numbers using the formula
sign * 2^exponent * mantissa
where exponent and mantissa are computed as
exponent = e - 127 mantissa = 1.m
What is the floating point number stored as
0001 0001 0100 1000 0000 0000 0000 0000
Answer:
1.5777218E-28
There are a few floating point converters available online. For example,
see
IEEE 754 Converter
by Harald Schmidt.
C/C++ Floating point types accommodate numbers in the following ranges:
float: 32 bits, range +/- 3.4 * 10^38 double: 64 bits, range +/- 1.8 * 10^308
Convert to IEEE single precision floating point format (stored in 32 bits):
(a) -.125 (b) 783 (c) .0390625
We need to fill in the sign, exponent, and mantissa in
sign * 2^exponent * mantissa
where exponent and mantissa are computed as
exponent = e - 127 ( 8 bits ) mantissa = 1.m ( 23 bits ) ( 1 is not stored because it's not needed)
The dot is called the radix point. From left to right, binary digits of mantissa represent combinations the of following base-2 fractions:
1/2 = 0.5 1/4 = 0.25 1/8 = 0.125 1/16 = 0.0625 1/32 = 0.03125 1/64 = 0.015625 1/128 = 0.0078125 ... ...
For example,
3.75 = 3 + 0.5 + 0.25 = 11.11 = 1.111 * (2^1) sign = 0 exponent = 10000000 ( 8 bits ) mantissa = 111000...00 ( 23 bits )
Note: fractions that can be fully assempled from base-2 fractions listed above are called dyadic fractions.
To compute mantissa (binary equivalent of fractional part after the radix point) of a non-dyadic fraction, such as 3.14, execute a series of 23 steps multiplying the decimal fractional part .14 number by 2. Integral part of resultant decimal numbers are the digits we use in binary mantissa, from left to right:
3.14 = 3 + 0.28 + 0.56 + 1.12 + 0.24 + 0.48 + 0.96 + 1.92 + 1.84 + 1.68 + 1.36 + 0.72 + 1.44 0 0 1 0 0 0 1 1 1 1 0 1 11. 001000111101... \____m________/ 23 bits
Binary fraction such as 11.001000111101 needs to be normalized. In normalized version of the binary fraction the integral part is always 1.
11.001000111101 = 1.1001000111101 * (2^1)
sign = 0 (+) exponent = 127 + 1 = 10000000 ( 8 bits ) mantissa = 1001000111101... ( 23 bits )
When normalizing, we have to adjust the radix point to the <-left or to the right-> by using positive and negative exponent, respectively:
... ... 2^-2 : e = 127 - 2 = 125 2^-1 : e = 127 - 1 = 126 negative power moves radix point to the right-> 2^0 : e = 127 + 0 = 127 2^+1 : e = 127 + 1 = 128 positive power moves radix point to the <-left 2^+2 : e = 127 + 2 = 129 ... ...
(a) -.125 = 1 01111100 00000000000000000000000 (b) 783 = 0 10001000 10000111100000000000000 (c) .0390625 = 0 01111010 01000000000000000000000