C Datastructures Networking OS

Tuesday, 5 March 2013

Longest Common Subsequence...

Given two Strings X of length m and Y of length n.We have to find longest common sub-sequence(order matters but not necessarily contiguous block) in both the strings
  for example,
                     X=ABCDEF
                     Y=DGHEF
longest sub-sequence  is DEF of length 3 

One naive method is to find all sub-sequences of both the strings individually and compare the subsequences which match .Subsequence with longest length is the required result.

But this method requires exponential time as each string has 2^(length of string) subsequences.
We can use Dynamic Programming to solve this problem.
Let the two strings be ABCDEF and DGHEF


#include<stdio.h>
#include<stdlib.h>
int p=0,maxx=0,a[40];  
int max(int a, int b);  
int lcs( char *X, char *Y, int m, int n ) //returns length of subsequence
{
   int L[m+1][n+1]; //matrix to store lengths of subsequences
   int i, j;
     for (i=0; i<=m; i++)
   {
     for (j=0; j<=n; j++)
     {
       if (i == 0 || j == 0)
         L[i][j] = 0;
  
       else if (X[i-1] == Y[j-1])
       {

         L[i][j] = L[i-1][j-1] + 1;

         if(L[i][j]>maxx)        //a[] is array to store postions of matching characters,
                                       p indicates the position
                  {
                        maxx=L[i][j];

                        a[p++]=i-1;
                   }
               }
       else
         L[i][j] = max(L[i-1][j], L[i][j-1]);
      }
   }
   
   return L[m][n];
}
  
int max(int a, int b)
{
    return (a > b)? a : b;
}
  
int main()
{
  char X[] = "ABCDEF";
  char Y[] = "DGHEF";
  
  int m = strlen(X);
  int n = strlen(Y);
  printf("Length of LCS is %d\n", lcs( X, Y, m, n ) );

    for(int i=0;i<p;i++)
          printf("%c", X[a[i]]);
    getchar();
  return 0;
}

Explanation :
    m=6,n=5
                    ABCDEF     DGHEF
    for each character of string X(i) compare with all characters of string Y(j)
         and update array L[i][j]
    for example when i=0 and j=0,the elements of array becomes
                      
             
                      when i=0 and j=0 L[i][j] will be assigned 0
                      when i=0 or j=0  L[i]j] will be assigned 0
                      when X[i-1]==Y[i-1],i.e whenever characters of both strings match then L[i][j]=L[i-1][j-1]+1
                      when i=3 and j=1
                                   as is it the first character to be matched ,L[i][j] upto i=3 and j=0 is 0
                                    therefore 
                                  L[3][1] = L[2][0]+1=0+1
                                  length of common subsequence upto D is 1
                Now the becomes as follows
   and a[0] will be assigned with 3
If the characters are not matched then maximum length of  common subsequence upto that character will be assigned to L[i][j]
Thus the process repeats and L[6][5] will be returned as length of longest common subsequence and array a[] contains longest common subsequence


No comments:

Post a Comment