Dynamic Programming—Chained Matrix Multiplication

Multiplying unequal matrices

• Suppose we want to multiply two matrices do not have the same number of rows and columns

• We can multiply two matrices A₁ and A₂ only if the number of columns of A₁ is equal to the number of rows of A₂

• Example: We want to multiply a 2 X 3 matrix by a
3 X 4 matrix

– The resultant matrix is a 2 X 4 matrix

• This will have 4 terms in the top row and 4 in the bottom

• Each term is the result of 3 multiplications

• So the total number of multiplications is **234**

• Generalizing: if we want to multiply an N X M matrix by an M X P matrix it will take **NMP** multiplications

Chained Matrix Multiplication

• We are given a sequence (chain) A₁, A₂, …,A_n of n matrices, and we wish to find the product

• The way we parenthesize a chain of matrices can have a dramatic impact on the cost of evaluating the product.

• This problem is to determine the best way to parenthesize the matrices to minimize the number of multiplications

Example:

_•A₁5 X 3

_•A₂3 X 4

_•A₃4 X 6

• A₄6 X 5

• The problem: what is the best order to multiply them?

• If we multiply

– (A₁( (A₂A₃) A₄) ) takes 237 multiplications

– (A₁(A₂(A₃A₄) )) takes 255 multiplications

– ( (A₁A₂) (A₃A₄) ) takes 280 multiplications

– ( ( (A₁A₂) A₃) A₄) takes 330 multiplications

– ( (A₁(A₂A₃) ) A₄) takes 312 multiplications

How to parenthesize the matrices

• In the case of four matrices, there are only five ways to order the multiplications

• But with n matrices, the number of ways to parenthesize them grows exponentially (4ⁿ/n^3/2) so we do not want to look at all the possibilities

• Dividing the problem into subproblems

– We use the principal of optimality which is said to apply if an optimal solution to an instance of a problem always contains optimal solutions to all substances.

– If A₁((((A₂A₃)A₄)A₅)A₆) is the optimal order then we know that (A₂A₃)A₄ is the optimal order for A₂A₃A₄

The matrix-chain problem

• We need to find the best way to parenthesize the chain of matrices to minimize the number of scalar multiplications

• We do this by dividing the problem into subproblems, then finding the optimal solution to the subproblem.

_•Suppose we have matrix-chain A₁ .. A_n

_•We divide this into subproblems A₁..A_k and A_{k+1 ..} A_n

• The problem is that we do not know what the k should be

• We find k by looking at the optimal solutions of each of the subproblems.

– This means looking at all the values for k

The chain A1..A4

• Look at the possible chains of length two

– How many values of k are there for chains of length two?

• Then the possible chains of length three

– How many values of k?

• Then chains of length four; etc.

• We need some kind of organization here to help us find the optimal way to make chains of length three out of chains of length two

• We need some symbolism and tables to keep track of calculations and minimum number of scalar multiplications

How we keep track

• We keep the dimensions of the matrices in a 1-D array.

0	1	2	3	4
5	3	4	6	2

_•A₁5 X 3

_•A₂3 X 4

_•A₃4 X 6

• A₄6 X 2

• A_i has dimensions i-1 and i

• So our array starts at zero and goes to four

• We keep the number of multiplications for chains of length one, two three and four in a 2-D array

The 2-D array called N

• N[1][1] will hold the number of multiplications to multiply from A1 to A1

– This of course is zero, since it is a chain of length one

– N[i][i] where i = 1 to n is always zero

• N[1][2] holds the minimum number of multiplications to multiply A1 to A2

– This is a chain of length two

– Add N[2][3], N[3][4] to the table; this holds the other chains of length two

_•N[1][3] holds the minimum number of multiplications for one chain of length three N_1,3

– This could be either (A₁A₂)A₃ nor A₁(A₂A₃)

– So we have two possibilities, and we must decide on the minimum

We use the term k to show where we decide to put the parentheses

– In the terminology we use, we must decide on k, which is the how we divide the matrix chain

• **N_i,j = min{ N_i,k + N_k+1,,j + d_i-1d_kd_j }
where i <= k < j**

Explaining the formula

• **N_i,j = min{ N_i,k + N_k+1,j + d_i-1d_kd_j }
where i <= k < j**

• Our chains are symbolized by N_i,j

• In our problem we have two chains of length three, denoted N_1,3 and N_2,4

• Looking first at N_1,3

– There are two possibilities (A₁A₂)A₃orA₁(A₂A₃)

• We already have in our 2-D table how many multiplications it takes to multiply (A₁A₂)and(A₂A₃)

– So we have two possible values of k, and we must check them both, and take the minimum

– i = 1 and j = 3 and k can equal either 1 or 2

We plug them in to the formula, and take the minimum.

Basically, what the formula is doing is taking the number of multications for a chain of length two from the table, and adding to that sum the number the number of multications needed to link in another matrix

Another Example

• A₁ is the first matrix in our chain, A_n the last

• A₁ 30 X 35

• A₂ 35 X 15

• A₃ 15 X 5

• A₄5 X 10

• A₅ 10 X 20

• A₆ 20 X 25

• We keep the dimensions in an array.

• A_i has dimensions i-1 and i

• So our array index starts at zero and goes to six

The variables and data structures

_•The notation for the optimal number of scalar multiplications for subproblem is N_i,j

– i is the subscript of the first matrix in the subproblem;

– j is the subscript of the last

– k is the subscript for the way to break Ni,j into subproblems

• i <= k < j

_•Breaking N_i,j into subproblems would be N_i,k and N_k+1,_j

– We are looking for the k that gives N_i,j( the optimal for this subproblem).

– It is given by the formula:

• **N_i,j = min{ N_i,k + N_k+1,j + d_i-1d_kd_j }**
where i <= k < j

The 2-D array for multiplications

	1	2	3	4	5	6
1	0	15,750	7,875	9,375	11,875	15,125
2		0	2,625	4,375	7,125	10,500
3			0	750	750	5,375
4				0	1,000	3,500
5					0	5,000
6						0

The 2-D array to keep track of k

	1	2	3	4	5	6
1		1	1	3	3	3
2			2	3	3	3
3				3	3	3
4					4	5
5						5
6

Pseudocode for chained matrices

int minMult(int n, int [ ] d, index [ ][ ] P)
{   index i,j, k dia;
    int [ ][ ] M = new int[1..n][1..n]
    for (i = 1; i <= n; i++) M[i][i] = 0; // initialize
    for (dia=1; dia<=n-1; dia++)
       for (i=1; i <=n-dia; i++)
      {    j = I + dia;
           M[i][j]= min M[i][k] + M[k+1][j] +d[i-1]d[k] d[j];
                       ⁱ^≤^k^≤^j-1
           P[i][j] = the value of k that gave the min
    }
    return M[i][j];
}

Printing the optimal order

• In addition to finding the number of multiplications, we need to keep track of where to put the parentheses

• This is done in a separate 2-D array

• K is recorded for N_i,j when the minimum is found

Analyzing the Matrix Chain-Algorithm

• We can compute N_1,n with an algorithm that consists primarily of three nested for-loops

Dynamic Programming—Chained Matrix Multiplication

Multiplying unequal matrices

• Suppose we want to multiply two matrices do not have the same number of rows and columns

• We can multiply two matrices A1 and A2 only if the number of columns of A1 is equal to the number of rows of A2

• Example: We want to multiply a 2 X 3 matrix by a 3 X 4 matrix

– The resultant matrix is a 2 X 4 matrix

• This will have 4 terms in the top row and 4 in the bottom

• Each term is the result of 3 multiplications

• So the total number of multiplications is 2*3*4

• Generalizing: if we want to multiply an N X M matrix by an M X P matrix it will take N*M*P multiplications

Chained Matrix Multiplication

• We are given a sequence (chain) A1, A2, …,An of n matrices, and we wish to find the product

• The way we parenthesize a chain of matrices can have a dramatic impact on the cost of evaluating the product.

• This problem is to determine the best way to parenthesize the matrices to minimize the number of multiplications

Example:

• A1 5 X 3

• A2 3 X 4

• A3 4 X 6

• A4 6 X 5

• The problem: what is the best order to multiply them?

• If we multiply

– (A1( (A2A3) A4) ) takes 237 multiplications

– (A1 (A2 (A3A4) )) takes 255 multiplications

– ( (A1A2) (A3A4) ) takes 280 multiplications

– ( ( (A1A2) A3) A4) takes 330 multiplications

– ( (A1(A2A3) ) A4) takes 312 multiplications

How to parenthesize the matrices

• In the case of four matrices, there are only five ways to order the multiplications

• But with n matrices, the number of ways to parenthesize them grows exponentially (4n/n3/2) so we do not want to look at all the possibilities

• Dividing the problem into subproblems

– We use the principal of optimality which is said to apply if an optimal solution to an instance of a problem always contains optimal solutions to all substances.

– If A1((((A2A3)A4)A5)A6) is the optimal order then we know that (A2A3)A4 is the optimal order for A2A3A4

The matrix-chain problem

• We need to find the best way to parenthesize the chain of matrices to minimize the number of scalar multiplications

• We do this by dividing the problem into subproblems, then finding the optimal solution to the subproblem.

• Suppose we have matrix-chain A1 .. An

• We divide this into subproblems A1..Ak and Ak+1 .. An

• The problem is that we do not know what the k should be

• We find k by looking at the optimal solutions of each of the subproblems.

– This means looking at all the values for k

The chain A1..A4

• Look at the possible chains of length two

– How many values of k are there for chains of length two?

• Then the possible chains of length three

– How many values of k?

• Then chains of length four; etc.

• We need some kind of organization here to help us find the optimal way to make chains of length three out of chains of length two

• We need some symbolism and tables to keep track of calculations and minimum number of scalar multiplications

How we keep track

• We keep the dimensions of the matrices in a 1-D array.

• A1 5 X 3

• A2 3 X 4

• A3 4 X 6

• A4 6 X 2

• Ai has dimensions i-1 and i

• So our array starts at zero and goes to four

• We keep the number of multiplications for chains of length one, two three and four in a 2-D array

The 2-D array called N

• N[1][1] will hold the number of multiplications to multiply from A1 to A1

– This of course is zero, since it is a chain of length one

– N[i][i] where i = 1 to n is always zero

• N[1][2] holds the minimum number of multiplications to multiply A1 to A2

– This is a chain of length two

– Add N[2][3], N[3][4] to the table; this holds the other chains of length two

• N[1][3] holds the minimum number of multiplications for one chain of length three N1,3

– This could be either (A1A2)A3 nor A1(A2A3)

– So we have two possibilities, and we must decide on the minimum

– In the terminology we use, we must decide on k, which is the how we divide the matrix chain

• Ni,j = min{ Ni,k + Nk+1,,j + di-1*dk*dj } where i <= k < j

Explaining the formula

• Ni,j = min{ Ni,k + Nk+1,j + di-1*dk*dj } where i <= k < j

• Our chains are symbolized by Ni,j

• In our problem we have two chains of length three, denoted N1,3 and N2,4

• Looking first at N1,3

– There are two possibilities (A1A2)A3 or A1(A2A3)

• We already have in our 2-D table how many multiplications it takes to multiply (A1A2) and (A2A3)

– So we have two possible values of k, and we must check them both, and take the minimum

– i = 1 and j = 3 and k can equal either 1 or 2

Another Example

• A1 is the first matrix in our chain, An the last

• We can multiply two matrices A₁ and A₂ only if the number of columns of A₁ is equal to the number of rows of A₂

• Example: We want to multiply a 2 X 3 matrix by a
3 X 4 matrix

• So the total number of multiplications is **234**

• Generalizing: if we want to multiply an N X M matrix by an M X P matrix it will take **NMP** multiplications

• We are given a sequence (chain) A₁, A₂, …,A_n of n matrices, and we wish to find the product

_•A₁5 X 3

_•A₂3 X 4

_•A₃4 X 6

• A₄6 X 5

– (A₁( (A₂A₃) A₄) ) takes 237 multiplications

– (A₁(A₂(A₃A₄) )) takes 255 multiplications

– ( (A₁A₂) (A₃A₄) ) takes 280 multiplications

– ( ( (A₁A₂) A₃) A₄) takes 330 multiplications

– ( (A₁(A₂A₃) ) A₄) takes 312 multiplications

• But with n matrices, the number of ways to parenthesize them grows exponentially (4ⁿ/n^3/2) so we do not want to look at all the possibilities

– If A₁((((A₂A₃)A₄)A₅)A₆) is the optimal order then we know that (A₂A₃)A₄ is the optimal order for A₂A₃A₄

_•Suppose we have matrix-chain A₁ .. A_n

_•We divide this into subproblems A₁..A_k and A_{k+1 ..} A_n

_•A₁5 X 3

_•A₂3 X 4

_•A₃4 X 6

• A₄6 X 2

• A_i has dimensions i-1 and i

_•N[1][3] holds the minimum number of multiplications for one chain of length three N_1,3

– This could be either (A₁A₂)A₃ nor A₁(A₂A₃)

• **N_i,j = min{ N_i,k + N_k+1,,j + d_i-1d_kd_j }
where i <= k < j**

• **N_i,j = min{ N_i,k + N_k+1,j + d_i-1d_kd_j }
where i <= k < j**

• Our chains are symbolized by N_i,j

• In our problem we have two chains of length three, denoted N_1,3 and N_2,4

• Looking first at N_1,3

– There are two possibilities (A₁A₂)A₃orA₁(A₂A₃)

• We already have in our 2-D table how many multiplications it takes to multiply (A₁A₂)and(A₂A₃)

• A₁ is the first matrix in our chain, A_n the last

• A₁ 30 X 35

• A₂ 35 X 15

• A₃ 15 X 5

• A₄5 X 10

• A₅ 10 X 20

• A₆ 20 X 25

• A_i has dimensions i-1 and i

_•The notation for the optimal number of scalar multiplications for subproblem is N_i,j

_•Breaking N_i,j into subproblems would be N_i,k and N_k+1,_j

– We are looking for the k that gives N_i,j( the optimal for this subproblem).

• **N_i,j = min{ N_i,k + N_k+1,j + d_i-1d_kd_j }**
where i <= k < j

• K is recorded for N_i,j when the minimum is found

• We can compute N_1,n with an algorithm that consists primarily of three nested for-loops

• So the total running time is O(n³)

– In 1982 Yao developed an O(n²)algorithm