C++ Intro

Random Access to Binary Files


Working with Binary Files


File streams include two member functions specifically designed to input and output binary data sequentially: write( ) and read( ). Their prototypes of these functions are as follows:


    myfile.write( memory_block, size );

    myfile.read( memory_block, size );

Here, the memory_block is of type "pointer to char" (char*). It provides the address of an array of bytes where the read data elements are stored or from where the data elements to be written are taken. The size parameter is an integer value that specifies the number of characters to be read or written from/to the memory block.

The following program takes user input from std::cin and stores data in a file that has a binary format:


// Creating a binary file
#include <iostream>
#include <fstream>
using namespace std;

int main()
{
    double value;
    char* pmemory = ( char* ) &value;

    // Open binary file to write:
    ofstream out_file( "example.txt", ios::out | ios::binary | ios::trunc );

    while ( cin >> value ) {
        out_file.write( pmemory, sizeof( double ) );
    }
    out_file.close();

    // Open binary file to read:
    ifstream in_file( "example.txt", ios::in | ios::binary );
    in_file.read( pmemory, sizeof( double ) );

    while ( !in_file.eof() ) {
        cout << value << endl;
        in_file.read( pmemory, sizeof( double ) );
    }
    in_file.close();

    return 0;
}


Internal Stream Pointers


File streams have internal stream pointers. The idea of a pointer inside a stream is to provide access to the memory where the next read/write operation is to take place. Pointer to read the data from the file is named the get pointer. Pointer to write the data from the file is named the put pointer. There are four member functions to manipulate the stream pointers:

function description
tellg Get current read position in the stream:
ifstream::pos_type position = myfile.tellg();
tellp Get current write position in the stream
ofstream::pos_type position = myfile.tellp();
seekg Move current read position in the stream, where absolute offset is counted from the beginning of the file:
ifstream::pos_type position = myfile.tellg();
myfile.seekg( position );
                                
In addition to absolute movements, another version of seekg( ) allows additional parameter named direction:
myfile.seekg( position, direction );
                                
where direction determines the point from where position is counted from:
  • ios::beg -- offset counted from the beginning of the stream
  • ios::cur -- offset counted from the current position of the stream pointer
  • ios::end -- offset counted from the end of the stream
seekp Move current write position in the stream, where absolute offset is counted from the beginning of the file:
ofstream::pos_type position = myfile.tellp();
myfile.seekp( position );
                                
In addition to absolute movements, another version of seekp( ) allows additional parameter named direction (same as for seekg( ) calls):
myfile.seekp( position, direction );
                                

The following program is using seekg( ) function to set the read-pointer of file input stream at the end of the file. Next, it is using a tellg( ) function to get the current offset of the pointer, which equals the size of the entire file:


// How to obtain file size
#include <iostream>
#include <fstream>
using namespace std;

int main()
{
    // Open a file:
    ifstream myfile( "example.txt", ios::in | ios::binary );

    // get current read position in the stream:
    ifstream::pos_type begin = myfile.tellg();

    // move current read position in the stream:
    myfile.seekg( 0, ios::end );

    ifstream::pos_type end = myfile.tellg();
    myfile.close();

    cout << "size is: " << ( end - begin ) << " bytes.\n";
    return 0;
}



C-style File Operations


C++ includes support for the library of C functions, including those traditionally used when working with files. The following functions deal with basic operations on files. They are defined in <stdio.h> standard library header.

function description
fopen Opens the named file, and returns a stream, or NULL if the attempt fails. Legal values for mode include:
  • "r"     -- open text file for reading
  • "w"    -- create text file for writing; discard previous contents if any
  • "a"     -- append; open or create text file for writing at end of file
  • "r+"   -- open text file for update (i.e., reading and writing)
  • "w+" -- create text file for update, discard previous contents if any
  • "a+"   -- append; open or create text file for update, writing at end
fputc Writes one character, converted to an unsigend char into a file. Function returns the character written, or EOF for error.
fgetc Returns the next character of stream as an unsigend char, converted to an int, or EOF if end of file or error occurs.
feof Returns non-zero if the end of file indicator for stream is set.
fclose Flushes any unwritten data for stream, discards any unread buffered input, frees any automatically allocated buffer, then closes the stream. It returns EOF if any errors occurred, and zero otherwise.
perror Prints an error message.

The following program implements file copy using standard C library functions. The type size_t is the unsigned integral type, just like unsigned int:


//file copy using standard C library functions
#include <stdio.h>
#include <iostream>
#include <string>
using namespace std;

int save_file( string const& file_name_, string const& str_ )
{
    // create file if didn't exist:
    FILE *fp = fopen( file_name_.c_str(), "w+" );
    if ( fp == NULL ) {
        // couldn't open file, abort
        perror( ("fopen: cannot open file " + file_name_).c_str() );
        return 0;
    }

    size_t cnt = 0;
    for ( ; cnt < str_.length(); ++cnt ) {
        if ( fputc( str_[ cnt ], fp ) == EOF ) {
            perror( ("fputc: cannot write into " + file_name_).c_str() );
            return 0;
        }
    }
    fclose( fp );
    return cnt;
}

int load_file( string const& file_name_, string& str_ )
{
    // open file for reading:
    FILE *fp = fopen( file_name_.c_str(), "r" );
    if ( fp == NULL ) {
        perror( ("fopen: cannot open file " + file_name_).c_str() );
        return 0;
    }

    int cnt = 0;
    int ch = fgetc( fp );
    for( ; feof( fp ) == 0; ++cnt ) {
        str_.append( 1, ( char )ch );
        ch = fgetc( fp );
    }

    fclose( fp );
    return cnt;
}

int main( int argc, char* argv[] )
{
    if ( argc != 3 ) {
        cout << "Please specify files to copy.";
        cout << "Usage:";
        cout << argv[ 0 ] << " sourcefile targetfile";
        return 1;
    }
    string input;
    load_file( argv[ 1 ], input );
    save_file( argv[ 2 ], input );
    return 0;
}


Bitwise Operators


A bitwise operation operates on one or two bit patterns represented by binary numbers at the level of their individual bits:

A bitwise NOT or complement, is a unary operation which performs logical negation on each bit, forming the ones' complement of the given binary value. Digits which were 0 become 1, and vice versa. For example:

    ~0101
   = 1010
                    

    int x = 5;  // 0101
    int y = ~x; // 1010 (dec 8)

A bitwise OR takes two bit patterns of equal length, and produces another one of the same length by matching up corresponding bits (the first of each; the second of each; and so on) and performing the logical OR operation on each pair of corresponding bits. In each pair, the result is 1 if the first bit is 1 OR the second bit is 1 (or both), and otherwise the result is 0. For example:

    0101
    |
    0011
  = 0111

    int x = 5;     // 0101
    int y = 3;     // 0011 
    int z = x | y; // 0111 (dec 7)

A bitwise AND takes two binary representations of equal length and performs the logical AND operation on each pair of corresponding bits. In each pair, the result is 1 if the first bit is 1 AND the second bit is 1. Otherwise, the result is 0. For example:

    0101
    &
    0011
  = 0001                    
  

    int x = 5;     // 0101
    int y = 3;     // 0011 
    int z = x & y; // 0001

A bitwise exclusive OR takes two bit patterns of equal length and performs the logical XOR operation on each pair of corresponding bits. The result in each position is 1 if the two bits are different, and 0 if they are the same. For example:

    0101
    ^
    0011
  = 0110                    
  

    int x = 5;     // 0101
    int y = 3;     // 0011 
    int z = x ^ y; // 0110 (dec 6)


                

Data encryption examples


Take a look at the following program. It encrypts the input and prints result on the screen. It takes a string of text (user input) and encrypts it by applying the bitwise XOR operator^ to every character using a given key. To decrypt the output, merely reapplying the key removes the cipher.


// Utility to encrypt input stream.
#include <iostream>

int main(int argc, char *argv[])
{
    if ( argc == 1 ) {
        std::cout
            << "Note: This program tends to produce non-printable output.\n"
            << "      It is therefore recommended to redirect cin and cout\n"
            << "      from and to files. On many systems this can be done\n"
            << "      with a command like:\n"
            << argv[ 0 ] << " key < input_file > output_file\n\n"
            ;
    }

    // The key to encrypt input with:
    char const* key = ( argc == 2 ) ? argv[ 1 ] : "";
    size_t key_length = ( argc == 2 ) ? strlen( key ) : 1;

    char ch;
    for (
        size_t key_pos = 0; // start at the beginning of key
        std::cin.get( ch ); // get next input byte
        key_pos = ( key_pos + 1 ) % key_length // adjust key position
        )
    {
        // Encrypt one byte and send result to the output:
        std::cout.put( ch ^ key[ key_pos ] );
    }
    return 0;
}

In cryptography, a simple XOR cipher is a relatively simple encryption algorithm that operates according to the principles:


            CLEAR_MESSAGE ^ KEY = ENCRYPTED_MESSAGE (CIPHER)

        ENCRYPTED_MESSAGE ^ KEY = CLEAR_MESSAGE

The XOR operator is extremely common as a component in more complex ciphers. Its primary merit is that it is simple to implement, and that the XOR operation is computationally inexpensive. By itself, using a constant repeating key, a simple XOR cipher can trivially be broken. However, if the key is as long as the message (so it is never repeated) and its bits are random, it is in effect a "one-time pad", which is unbreakable in theory.

The following program uses encryption to encrypt a file. An interesting aspect of this utility is that it is using the same file stream to both read and write individual bytes into a file:


// Encrypt a file in place.
// The same program does encryption and decryption of the
// data stored in a file.
// The sample demonstrates how fstream is used to
// input and output data from/to the same file stream.

#include <iostream>
#include <fstream>
using namespace std;

int main( int argc, char* argv[] )
{
    if ( argc != 3 ) {
        cout
            << "Usage:"
            << argv[ 0 ]
            << " key inputfile"
            << endl
            ;
        return 1;
    }
    char ch; // 1 byte of the memory is required by the encryption

    // Open binary file to read.
    // The file name and path is specified as second
    // command argument. Using binary mode since the
    // data is a stream of raw binary bytes:
    fstream fs( argv[2], ios::in | ios::out | ios::binary );

    int offset = 0;     // keeps track of the bytes inside file stream.
    size_t key_pos = 0; // keeps track of the bytes passed as key

    // The key to encrypt input with:
    char const* key = argv[ 1 ];
    size_t key_length = strlen( key );

    fs.read( &ch, 1 ); // read one byte from the file.
    while ( !fs.eof() ) {
        ++offset;          // record the fact that one byte processed

        // Set put pointer inside the file stream to
        // the position of the byte that we are about to write.
        fs.seekp( offset - 1, ios::beg );

        ch ^= key[ key_pos ]; // encrypt byte with the key
        fs.write( &ch, 1 ); // write one byte into the file.

        // Reset the put pointer inside our file stream back to
        // the original position:
        fs.seekp( offset, ios::beg );
        key_pos = ( key_pos + 1 ) % key_length; // adjust key position
        fs.read( &ch, 1 ); // read next byte from the file.
    }
    fs.close();
    return 0;
}