CS 223 Out-lab 5
Week of
2/27

Due: to TA by 3/23

Sequence Alignment

The object of this out-lab is to implement the Smith-Waterman dynamic programming algorithm discussed in class (notes here).  As imput, your program should read a text file that contains a substitution matrix followed by k pairs of strings to align.  As output, it should printout the best alignment for each pair of strings found.  You may assume that the string alphabet is {A,C,T,G}.  You may also assume the insertion/deletion cost is -1.

Example input file:

2    0    1     0
0    3    0    -1
0    0    1    1
0    0    0    2
2
TCCCTGGA
ACTTGGA
AACCTGGT
TTCTTG

(Note about above substitution matrix: It is assumed to be symmetric, and given in the order A, C, T, G.  So A vs T scores +1 and T vs A also scores +1. You will need to make your substitution scores are symmetric in your code.)

In the above example the substitution matrix is given first followed a by '2' indicating that there are two pairs of strings to align.  This is followed by the two pairs of strings that should be aligned (in this case TCCCTGGA should be aligned with ACTTGGA and AACCTGGT should be aligned with TTCTTG).

The alignments should be printed in this form:
ACTAGC-
-CTA-CA
score: 6

Extra credit: Implement a gap cost model and full Smith Waterman alignment.  See this page for some more info.

What to Turn In

  • a printed copy of all source code.  Ensure the source code is documented.
  • a printout of several test runs including the above example.