Faculty of Education -
Computer Science
Huffman Coding
Introduction
There are many different reasons for and ways of encoding
data, and one of these ways is Huffman coding. This is used as a compression
method in digital imaging and video as well as in other areas. The idea behind
Huffman coding is simply to use shorter bit patterns for more common characters,
and longer bit patterns for less common characters.
An Example
Say you want to encode the letters A (0.12), E (0.42), I
(0.09), O (0.30), U (0.07), listed with their respective probabilities. Go
through the following steps:
- Consider each of the letters as a symbol with its respective probability.
- Find the two symbols with the smallest probability and combine them into a
new symbol with both letters by adding the probabilities.
(Note: There may
be a choice between two symbols with the same probability. If this is the
case, either symbol can be chosen. The final tree and codes will be different,
but the overall efficiency of the code will be the same)
(Note 2:
Frequency counts or other values may be used instead of probabilities)
- Repeat step 2 until there is only one symbol left with a probability of 1.
- To see the code, redraw all the symbols in the form of a tree, where each
symbol contains either a single letter or splits up into two smaller symbols.
Label all the left branches of the tree with a 0 and all the right branches
with a 1. The code for each of the letters is the sequence of 0's and 1's that
lead to it on the tree, starting from the symbol with a probability of 1.
You should get a tree like the following:
Thus the codes for each letter are:
- A - 100
- E - 0
- I - 1011
- O - 11
- U - 1010
Now using this code, any string of vowels can be written
uniquely.
- AI = 1001011
- EIEIO = 010110101111
- UEA = 10100100
- 10110 = IE
- 100101011 = AUO
- 0101111111010 = EIOOU
Notice that each string of 0's and 1's can
be uniquely decoded. Try encoding and decoding some strings of vowels on your
own.
Activities
- Create Huffman trees and codes for the following sets of letters with the
given probabilities:
- A (0.20), B (0.09), C (0.15), D (0.11), E (0.40), F (0.05)
- A (0.05), C (0.04), E (0.16), G (0.02), I (0.04), L (0.07), M (0.09), N
(0.08), O (0.12), R (0.08), S (0.09), T (0.10), U (0.04), Y (0.02)
- A (0.04), B (0.02), C (0.03), D (0.03), E (0.13), F (0.01), G (0.02), H
(0.02), I (0.03), J (0.01), K (0.02), L (0.06), M (0.07), N (0.06), O
(0.10), P (0.02), Q (0.01), R (0.06), S (0.07), T (0.08), U (0.03), V
(0.02), W (0.02), X (0.01), Y (0.02), Z (0.01)
- For the codes you created, try encoding some words and have a friend
decode them.
- The following character counts were obtained from a document: D (894), E
(3320), F (698), M (661), O (1749), R (1600). Create a Huffman tree and code
for these characters and encode the word 'freedom'.
- Write a computer program which will input characters and their
frequencies, create a Huffman code, and encode and decode strings.
References
© 1998 by Warren Hagey
E-mail: wchagey@undergrad.math.uwaterloo.ca