The Idea
We all know the basic idea of file compression. We see ZIP files regularly, and know how to extract and compress them. Although most everyone knows how to use these compressed files, and understand that compressing files makes them smaller, most people have no idea how it works technically.
Though this may seem like magic, it's a fairly straightforward concept. In this article, we are going to be demystifying file compression.
The Two Classes of Compression
Although people typically think about ZIP files and pictures when talking about compression in general, many people don't realize that there are two different compression classes commonly used.

 Lossless Compression is a type of compression in which the data is compressed, and can be later decompressed without any loss of data. This is useful for things in which the minor details can really matter. For example, on a research paper, every character in the document would need to be the exact same after decompression, or the document could be poorly graded due to compression ignoring the commas.
 Lossy Compression is the other type of compression, where data can be slightly modified during the compression process. Pictures commonly are compressed this way, as we can not very well distinguish between two very similar colors. This is a good time to use lossy compression, and not be so specific when it comes to colors.
The Steps for Lossless Compression
We'll be doing "Huffman Coding" to do this, as it's one of the simplest compression methods.
Compressing files in this way takes four major steps, listed below.

 Generate statistical model of the data (the letters in a text file for instance). This would include finding out which letters occur more often, and which series of letters are most common.
 Generate a "dictionary", which is essentially a tree with the most common items closer to the starting point than uncommon items. This is used as a reference for what compressed data translates to when not compressed.
 Rewrite the data (text in this case) in a compressed form according to the dictionary.
 Combine all the parts.
Although it seems really complicated, it's not too hard to follow when there's an example. We're going to be demonstrating how this works with the following string.
Human Readable: AABABCABCD
Binary Translation: 01000001 01000001 01000010 01000001 01000010 01000011 01000001 01000010 01000011 01000100
As a note, this binary is 80 bits long.
The Statistics
To begin this compression process, we're going to start off by statistically analyzing the text. Here's some notes of what I see:

 The letter
A
appears 4 times, and is 40% of the letters.  The letter
B
appears 3 times, and is 30% of the letters.  The letter
C
appears 2 times, and it 20% of the letters.  The letter
D
appears 1 times, and is 10% of the letters.
 The letter
The Dictionary (Tree)
Now that we have a statistical analysis of this text, we can build a dictionary (tree) based on putting the most common letters the closest to the starting point on our tree, and making directions to each item on the tree. Here's the example tree that I came up with.
We have to somehow represent our dictionary in binary, so that during decompression, it can be referenced. This is done by writing the binary directions to the first spot on the tree, then writing what the letter is in binary, then doing the same for all the other objects on the tree. Here's a table with the directions, the letter, the binary for the letter, and the combined directions and binary for the letter.
Directions to Letter  Letter  Letter in Binary  Directions+Binary 

0  A  01000001  001000001 
10  B  01000010  1001000010 
110  C  01000011  11001000011 
111  D  01000100  11101000100 
Since we have the directions and binary for the letter, we just put them all one after another, which leaves us:
001000001 1001000010 11001000011 11101000100
Compressing the Data
Now that we have a dictionary (tree), we rewrite the text by using the bits in the tree as the directions to go to get to the letter we are writing.

A
is written as0
B
is written as10
C
is written as110
D
is written as111
Rewriting the text using this formula gives us 0 0 10 0 10 110 0 10 110 111
Combining it All
Here's the stuff we have so far in binary:

 Dictionary =
001000001 1001000010 11001000011 11101000100
 Compressed data =
0 0 10 0 10 110 0 10 110 111
 Dictionary =
To finish writing the file, we just write it in the order of [dictionary][compressed data]
.
This leaves us with this:
001000001 1001000010 11001000011 11101000100 0 0 10 0 10 110 0 10 110 111
which is 60 bits long, 20 bits shorter than the original binary text.
Decompression Though?
Decompression works in the same way, just reversed. This article is not going to cover the decompression part of this, as extracting the dictionary from the start of the binary is a more complicated process. I may cover this later on, but just keep in mind that it is possible, it just takes many steps.
Final Notes
Although this example may seem complicated, this is the simplest one that exists. As such, this one has many variations, which can deal with probability in sections of data and such.
If this (somehow) wasn't technical enough for you, take a look at David Huffman's paper A Method for the Construction of MinimumRedundancy Codes.