Wednesday, December 28, 2011

Longest Common Subsequence

LCS Problem Statement: Given two sequences, find the length of longest subsequence present in both of them. A subsequence is a sequence that appears in the same relative order, but not necessarily contiguous. For example, “abc”, “abg”, “bdf”, “aeg”, ‘”acefg”, .. etc are subsequences of “abcdefg”. So a string of length n has 2^n different possible subsequences.
It is a classic computer science problem, the basis of diff (a file comparison program that outputs the differences between two files), and has applications in bioinformatics.
Examples:
LCS for input Sequences “ABCDGH” and “AEDFHR” is “ADH” of length 3.
LCS for input Sequences “AGGTAB” and “GXTXAYB” is “GTAB” of length 4.

http://algorithms.tutorialhorizon.com/dynamic-programming-longest-common-subsequence/
http://www.geeksforgeeks.org/dynamic-programming-set-4-longest-common-subsequence/

Important Variations:
1. Longest Palindromic Subsequence
http://algorithms.tutorialhorizon.com/longest-palindromic-subsequence/
http://www.geeksforgeeks.org/dynamic-programming-set-12-longest-palindromic-subsequence/
2.Longest Common Substring
http://algorithms.tutorialhorizon.com/dynamic-programming-longest-common-substring/
3.

Method 1:Recursion
Consider the input strings “AGGTAB” and “GXTXAYB”. Last characters match for the strings. So length of LCS can be written as:
L(“AGGTAB”, “GXTXAYB”) = 1 + L(“AGGTA”, “GXTXAY”)
2) Consider the input strings “ABCDGH” and “AEDFHR. Last characters do not match for the strings. So length of LCS can be written as:
L(“ABCDGH”, “AEDFHR”) = MAX ( L(“ABCDG”, “AEDFHR”), L(“ABCDGH”, “AEDFH”) )
So the LCS problem has optimal substructure property as the main problem can be solved using solutions to subproblems.

int lcs( char *X, char *Y, int m, int n )
{
   if (m == 0 || n == 0)
     return 0;
   if(X[m-1] == Y[n-1])
     return 1 + lcs(X, Y, m-1, n-1);
   else
     return max(lcs(X, Y, m, n-1), lcs(X, Y, m-1, n));
}
Time complexity of the above naive recursive approach is O(2^n) in worst case and worst case happens when all characters of X and Y mismatch i.e., length of LCS is 0.
Method 2:Dyanamic Programming 

int lcs( char *X, char *Y, int m, int n )
{
   int L[m+1][n+1];
   int i, j;
   /* Following steps build L[m+1][n+1] in bottom up fashion. Note
      that L[i][j] contains length of LCS of X[0..i-1] and Y[0..j-1] */
   for (i=0; i<=m; i++)
   {
     for (j=0; j<=n; j++)
     {
       if (i == 0 || j == 0)
         L[i][j] = 0;
       else if(X[i-1] == Y[j-1])
         L[i][j] = L[i-1][j-1] + 1;
       else
         L[i][j] = max(L[i-1][j], L[i][j-1]);
     }
   }
   return L[m][n];
}
Time Complexity of the above implementation is O(mn) which is much better than the worst case time complexity of Naive Recursive implementation.

1 comment:

  1. http://www.leetcode.com/2010/09/print-all-combinations-of-number-as-sum.html

    http://www.geeksforgeeks.org/archives/7103

    http://anandtechblog.blogspot.com/2011/06/arrays_28.html

    ReplyDelete