Lab this week
Your lab this
week is to fill out a course evaluation
Your have two choices for this
Go to lab at your
usual time, and fill out the course evaluation there
Mike
will keep track of who comes to lab to do the evaluation
Fill it out on your
own time, and email me that you have done so
On
the subject line of the email, just put filled out the evaluation
Be
sure I can tell who the email is from
If you are in any other CS course, you should also fill
out a course evaluation for that course
Review for Final
Thursday December 16 @
6-7:50
Fall 2004
You may have one sheet of notes
Chapters in book
Carrano/Savitch
17 & 18 Dictionaries
19 Hashing
24 Trees
25 Tree
implementations
26 A BST
Implementation
27 A Heap
Implementation
28 Balanced
Search Trees
Neapolitan/Naimipour
Chap 1 Algorithms: Efficiency, Analysis, and
Order
Sections
1.1 & 1.2
Chap 2 Divide-and-Conquer
Sections: all but Sect 2.4
Chap 3 Dynamic
Programming
Section
3.1The Binomial Coefficient
Section
3.3 Dynamic Programming & Optimization Problems
Section
3.4 Chained Matrix Multiplication
Section
3.5 Optimal Binary Search Trees
Chap 4 The Greedy Approach
Sect
4.1 Minimum Spanning Trees
Prims & Kruskals algorithms
Sect
4.2 Dijkstras algorithm for single source shortest path
Sect
4.4 Huffman codes
Sect 4.5 The knapsack
problem
the 0-1, and
the fractional problem
Topics
Java programming
Java file I/O
Interfaces & Abstract classes
Iterators
The Collection Framework
Dictionaries
Hashing
Heaps
Balanced Search Trees
AVL Trees
Multiway search trees
Red-Black trees
Divide and Conquer Algorithms
Recurrence relations
Merge Sort
Multiplying large integers
Matrix multiplicationStrassens method
Dynamic
Programming algorithms
Binary Coefficients
Chained matrix multiplication
Optimal Binary Search Trees
Greedy
Algorithms
Minimum Spanning Trees
Prims
& Kruskals algorithms
Dijkstras
algorithm for single source shortest path
Huffman codes
The knapsack problem
Both
the 0-1, and the fractional problem
Dictionary ADT
A dictionary uses a key to retrieve information that
is related to the key
The dictionary can be implemented several ways
Sorted array
Sorted linked
list
Hash table
Binary search
tree
We looked at two Dictionary classes from Carrano that
implemented a dictionary interface
One objective in
doing that was to review different Java constructs
The other was to
become familiar with the Dictionary ADT
Hashing Overview
General idea of hashing
Hash functions
Time complexity of perfect hashing
Collision resolution methods
Load factor
Hash functions
Criteria for good hash functions
Quick and easy to compute
Minimize the
number of collisions
Achieve an even
distribution of the records across the range of indices (uniform hashing
function)
Writing a hash function
First, convert the key to an integer
Many ways to do this
Using a polynomial hash code for strings seems to work
well
Use Horners
method to evaluate the polynomial
Then, compress to size of the array
Collision resolution methods
Open address
Advantages & disadvantages
Types
Linear
probing
Quadric
probing
Double
hashing
Separate
chaining
Advantages & disadvantages.
Load factor
Optimal depends on the type of collision resolution
Open addressing should be about 0.5
Separate chaining can go as high as 1 or 2 without
much degradation
Books seem to recommend .75
Rehashing
If the load factor gets too high, the table should be
rehashed
This involves
Doubling the table size
Sending all the keys through the new hash function
Tree traversal
Inorder
Recursively visit all nodes in the left subtree
Process the root node
Recursively visit all nodes in the right subtree
Preorder
Process the node, go left, then go right
Postorder
go left, then go right, then rocess the node,
Binary Trees
Every node has at most two children
Types of binary trees
Expression trees
Binary Search Trees
Heaps
As long as the tree stays balanced, the height of the
tree is the log of the number of (nodes +1)
If you have a full binary tree with 127 nodes, what is
its height?
Log 128 = 7;
27=128
Heaps
Heap order property
Heap structure property
How heaps are stored in the machine
Adding an element to a heap
Deleting the max or min element from a heap
Time complexity of the major operations of a heap
Balanced Search trees
AVL trees
An AVL tree is a binary search tree that is either
empty or has the following two properties
1) the heights of
the left and right subtrees differ by at most 1
2) the left and right
subtrees are AVL trees
Multiway search trees
Multi-way search trees generalize binary search
trees into m-ary search trees
They allow more than one key to be stored in a node
There will always be one more child pointer than
key
Analysis of
multi-way search trees
The advantage is their easy-to-maintain balance
Not their shorter height
The reduction in height is offset by the increased
number of comparisons that the search may require at each node
Red-Black trees
Red-black tree is a BST that is empty, or in which the
root node is colored black, and every other node is colored red or
black, and the following properties are satisfied:
Red rule: If
an node is red, it parents must be black;
Path rule:
The number of black nodes must be the same in all paths from the root to a node
with no children or with one child.
Divide and Conquer
Binary search
Mergesort and Quicksort
Multiplying large
integers
The natural way to multiply integers too large to fit
into a long is to break it recursively into a high and low part,
then to multiply the high parts, the low parts
This requires four multiplications
T(n) = 4T(n/2) + cn; this is an O(n2)
algorithm
A more
efficient way notices that the middle term can be done with one multiplication
instead of two so we reduce the number of multiplications to three
Strassens method of
multiplication
The standard method to multiply matrices takes 8
multiplications and 4 add/sub for a 2 X 2 matrix
Strassens method takes 7 multiplications and 18
add/sub
In a 2 X 2 matrix, this is not worthwhile, but it can be used on
larger matrices that are divided into four submatrices
This works only because commutativity of multiplication
is not used
When the matrices have been divided to a point where
they reach a threshold, standard multiplication is used since the overhead
on a small matrix is not worthwhile
Dynamic programming
Binary Coeffieients
They are the coefficients when expanding a binomial
like (x + y)n (n is the
power to which the binomial is expanded
k is the
number of the term of the expansion
The coefficients of a binomial expansion are also the
terms in Pascals triangle
They are also the solution to the number of
combinations possible with n total elements, taken k at a time
Chained matrix multiplication
The problem is to figure out the order in which to
multiply a chain of matrices
The solution is to find the minimum number of multiplications
necessary to form chain of shorter length, save them in a table
, and use them to find optimal chains of longer length
Optimal BSTs
To use an optimal BST, certain information must be
available:
An ordered array with all the possible keys
The probability for each key that it will be found in
the tree
Insertions and deletion after the tree is built are
not possible without rebuilding the tree
The main reason for presenting this is so you know
that other options are available when doing certain searches
Graphs
Graph definitions
A graph G = (V, E) consists of set of vertices,
V and a set of edges, E.
Each edge is a pair (v, w) where v, w Є V
If the pair of edges is ordered,
then the graph is directed, sometimes called a digraph.
i.e. (v, w) may
be different than (w, v)
A path is a sequence of edges connecting
vertices
A cycle is a path in which the first and last
vertices are the same and there are no repeated edges
More terminology
The edges in
either a graph or a diagraph can be weighted.
Two vertices are adjacent
if an edge exists between them;
the edge is incident with its vertices
The degree of
a vertex in an undirected graph is the number of edges incident with it
Handshaking
Theorem: the sum of the degrees of the vertices is
twice the number of edges
Theorem: An undirected graph has an even number of vertices of
odd degree.
Vertices in directed graphs
When (u,v) is an edge of the directed graph
The in-degree
of a vertex v is the number of edges with v as their terminal
vertex
The out-degree of a vertex v is the number of
edges with v as their initial vertex
A loop at a
vertex is both an in-degree and out-degree
Breadth-first & depth-first traversals
Given an origin vertex, a breadth-first traversal
visits the origin and the all the origins adjacent vertices (called
neighbors),
Then it considers
each of these vertices neighbors
The vertices
closest to the start are evaluated first
The most distant
are evaluated last.
Given an origin vertex, a depth-first traversal
visits one adjacent node, then keeps visiting nodes that are adjacent
When it can go no
further, it backtracks until it comes to an adjacent node that has not been
visited
A depth-first search is also called backtracking
Representation of
graphs in the computer
Adjacency matrix
Adjacency list
Greedy algorithms
Greedy algorithms work in phases
In each phase, a decision is made that appears to be
good, without regard for future consequences
Generally, this means that some local optimum
is chosen
When the algorithms terminates, we hope that the local
optimum is equal to the global optimum
This is not always the case; sometimes greedy
algorithms produce suboptimal solutions.
It is important to know whether a certain greedy
algorithm solution is optimal.
Spanning trees (unweighted graphs)
Definition of a spanning tree
Let G be a simple
graph. A spanning tree of G is a
subgraph of G that is a tree containing every vertex of G
A spanning tree
can be made from a graph by removing edges that create circuit
This is the easiest way for people, looking a a
graph to make a spanning tree
Minimum Spanning Trees (weighted graphs)
A minimum spanning
tree contains all the vertices in the graph and minimizes the sum of the
weights of the edges.
The greedy method of problem solving is used to
find the MST from a graph G
There are two common algorithms, both greedy, that
differ by how a minimum edge is selected.
Kruskals algorithm
Continually select the smallest edge, unless it
creates a cycle with the other edges already selected. (several
clusters that may get connected only at the end)
Prims algorithm
Pick a starting vertex, and add a minimum adjacent
edge and vertex at each stage. (no cycles)
Time complexity of Prims and Kruskals algorithm
For each edge put into the MST, we must find the
shortest edge adjacent to any of the vertices that are marked
This requires a nested loop.
So the time complexity is O(n2)
Single source shortest path problems
We want to find the shortest path from a given vertex
to all the others
The input is a
graph (stored either as a adjacency matrix or list)
The cost of a
path is the sum of the cost of each edge in the path
Two types
Weighted shortest path
Given as input a weighted graph, G=(V,E) and a
distinguished vertex s, find the
shortest path from s to every other vertex in G
Unweighted shortest path
A
special case of the problem above. Consider all the
weights one.
Draw a graph from an adjacency matrix
Initial
configuration
known = false for all vertices except start
dist is infinity for
all vertices except s; it is 0 for s
path is zero for all
vertices Let v3 be the start vertex
Mark as
known all vertices that are path length 1 from v3
The path will be v3
Now mark as know
all the path lengths of 2
These will be adjacent to the nodes whose dist =1
The path will be the vertex at dist = 1
Mark as known the
vertices whose path length from v3 is 3, etc
Weighted graphshortest path
The general
method to solve this problem is called Dijkstras algorithm
This is a classic
example of a greedy algorithm
Greedy algorithms
usually solve a problem in stages by doing what is best at that stage
The problem with
greedy algorithms is that sometimes they do not give optimal solutions
Dijkstras algorithm works much like the unweighted
shortest path algorithm
We keep track of
the same things
The problem of data compression
The problem is to find an efficient way to encode a
data file
We will look at
text files
To represent a file for the computer, we use a binary
code
In this type of code, each character is represented by
a unique binary string, called the codeword
Remember, a codeword
is the bits used to encode a single character
There are two types of codes used
Fixed-length
binary codes
Variable-length
binary codes
Finding an optimal code to compress text
To be efficient, will need to find the frequency of
each character used in the file
Once we have calculated the frequency of each
character we put the frequency-character pair in a minimum priority
queue
Then we use an algorithm developed by Huffman to build
an optimal binary tree
The leaves
will be the characters, the path to the leaf will the codeword.
Huffmans strategy
Set up the priority queue for each character:frequency pair
Go through this loop for each different character in
the file
Pop two nodes off
the priority queue
Create a new node
to hold the sum of their frequencies
The new nodes
left child is the character with the smallest frequency
Its right node
is the other node
Enqueue the new
node with the sum of the frequencies
Another example
Put these into a priority queue
(a:16)
(b:5)(c:12)(d:17)(e:10)(f:25)
then use Huffmans algorithm to determine the codeword for each letter
Analyzing the Huffman tree
It will take O(n) time to
initialize the heap for the priority queue
Every time an item is dequeued or enqued, it takes O(log n) time
Since we go through the loop n-1 times, it takes n + 3(log n)* (n-1) time
The Big O time complexity is then O(n log n) to build the tree to get the code to compress a text
file.
Two forms of the knapsack problem
The 0-1 Knapsack problem
The Fractional Knapsack problem
We should look at least two ways to solve these
problems
Dynamic approach
Greedy approach
Often a greedy solution will be simpler than a dynamic
programming solution
But a greedy solution may not be optimal
The 0-1 problem
There are n items; Let
S= {item1,
item2,
itemn} the set of items
wi=
weight of itemi
pi
= profit of itemi
W = maximum
weight the knapsack can hold
We want to find a subset A of S such that
the sum of the profit of the items in A is maximized and
the sum of the weights of the items in A is <= W
Why the greedy approach wont work
1. Take the items with the greatest profit
first
Example:W=30;
(item1: w1=25, pi=10)(item2:
w2=10, p2=9)
(item3: w3=10, p3=9)
2. Take the items with the least weight
first
Will this always give an optimal solution?
Any
better solutions?
3. Take the items with the highest
profit/weight ratio first
Example:W=30; (item1:
w1=5, pi=50)
(item2: w2=10, p2=60)
(item3: w3=20, p3=140)
w1/pi=10 w2/p2=6 w3/p3=7
Does this give an
optimal solution?
The facts are
that a greedy algorithm cannot find the optimal solution.
A dynamic programming approach to the 0-1
problem
We will be able to find a solution this way if the principle
of optimality applies
The principal of
optimality applies if an optimal solution to an instance of a problem always
contains optimal solutions to all sub-problems
Does it apply?
Let A be an optimal subset of the n items
Either A contains itemn or it does not
If A does not contain itemn, A is equal
to an optimal subset of the first n-1 items
If A does contain itemn, the
total profit = pn + optimal profit from n-1 items where
the W is not exceeded
Solving the 0-1 problem
To store our data
we use a 2-D array P
n is
the total number of items;
W the max weight
The rows are numbered from 0 to n
The columns are numbered from 0 to W
Initialize: P[0][w] = 0; P[i][0] = 0
We compute row 1 from row 0, row 2 from row 1, etc
P[n][W] is the maximum total profit for the entire knapsack
with w <= W
Let P[i][w] be the maximum total profit of the first i items in a subset having total weight w
Use the following formula to fill in the table
p[i][w] = max(P[i-1][w], Pi + P[i-1][w-wi] if wi <= w or
p[i][w] = P[i-1][w]
if wi >
w
Example:W=30;
(item1: w1=25, pi=10)(item2:
w2=10, p2=9)
(item3: w3=10, p3=9)
Solving the 0-1 problem (cont)
Example:W=30;
(item1: w1=25, pi=10)(item2:
w2=10, p2=9)
(item3: w3=10, p3=9)
Improving on this
algorithm
Since what we
really want is the total maximum profit P[n][W], we
only do those calculations needed to find it
In this example
we need P[3][30]
To find this, we need P[2][30]
and P[2][10]
To find P[2][30] we need P[1][30] and P[1][20]
To find P[2][10] we need P[1][10 and P[1][20]
So in the first
row, we need three entries
In the second
row, we need two entries
In the third row,
we need only one entry
Analyzing the solution
Solving the 0-1
knapsack problem using dynamic programming can be very expensive.
What do you think
the running time depends on?
The number of items?
The weight?
If n = 20 and W =
20! The original algorithm will take thousands of years to run on a modern-day
computer
When W is extremely large compared to n, the original
algorithm is worse than the brute-force algorithm that considers all the
subsets
With the
improvement we made by only calculating the needed values, it is never worse
than the brute force method, and often is much better
But the worst
case time complexity is still very bad
O(min(2n, nW))
Back to the fractional knapsack problem
Look at one of the problems we looked at for the 0-1
problem
Example: W=30;
(item1: w1=5, pi=50)
(item2: w2=10, p2=60)
(item3: w3=20, p3=140)
w1/pi=10 w2/p2=6 w3/p3=7
Can this be solved by the greedy solution if the
problem
is fractional instead of 0-1