Thursday, January 19, 2012

Smallest window in string


Given two strings string1 and string2, find the smallest substring in string1 containing all characters of string2 efficiently.
For Example:
Input string1: “this is a test string”
Input string2: “tist”
Output string: “t stri”

Method 1:Two Pointer Method
The idea is mainly based on the help of two pointers (begin and end position of the window) and two tables (needToFind and hasFound) while traversing SneedToFind stores the total count of a character in T and hasFound stores the total count of a character met so far. We also use a count variable to store the total characters in T that’s met so far (not counting characters where hasFound[x] exceeds needToFind[x]). When count equals T‘s length, we know a valid window is found.
Each time we advance the end pointer (pointing to an element x), we increment hasFound[x] by one. We also increment count by one if hasFound[x] is less than or equal to needToFind[x]. Why? When the constraint is met (that is, count equals to T‘s size), we immediately advance begin pointer as far right as possible while maintaining the constraint.
How do we check if it is maintaining the constraint? Assume that begin points to an element x, we check if hasFound[x] is greater than needToFind[x]. If it is, we can decrement hasFound[x] by one and advancing begin pointer without breaking the constraint. On the other hand, if it is not, we stop immediately as advancing begin pointer breaks the window constraint.
Finally, we check if the minimum window length is less than the current minimum. Update the current minimum if a new minimum is found.
Essentially, the algorithm finds the first window that satisfies the constraint, then continue maintaining the constraint throughout.

Code:
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<limits.h>

bool minWindow(const char* S, const char *T,
               int &minWindowBegin, int &minWindowEnd)
{
  int sLen = strlen(S);
  int tLen = strlen(T);
  int needToFind[256] = {0};
  int hasFound[256] = {0};
  int minWindowLen = INT_MAX;
  int count = 0;

  for (int i = 0; i < tLen; i++)
    needToFind[T[i]]++;

  for (int begin = 0, end = 0; end < sLen; end++)
  {
    // skip characters not in T
    if (needToFind[S[end]] == 0) continue;
    hasFound[S[end]]++;
    if (hasFound[S[end]] <= needToFind[S[end]])
      count++;

    if (count == tLen) {
    
      while (needToFind[S[begin]] == 0 ||
            hasFound[S[begin]] > needToFind[S[begin]]) {
        if (hasFound[S[begin]] > needToFind[S[begin]])
          hasFound[S[begin]]--;
        begin++;
      }

      int windowLen = end - begin + 1;
      if (windowLen < minWindowLen) {
        minWindowBegin = begin;
        minWindowEnd = end;
        minWindowLen = windowLen;
      } // end if
    } // end if
  } // end for
}

int main()
{
char a[]="acbbaca";
char b[]="aba";
int minWindowBegin=0;
int minWindowEnd=0;
minWindow(a, b,minWindowBegin,minWindowEnd);

printf("Minimum window is from %d to %d", minWindowBegin,minWindowEnd);

getchar();

return 0;
}
Time:Both the begin and end pointers can advance at most N steps (where N is S‘s size) in the worst case, adding to a total of 2N times. Therefore, the run time complexity must be in O(N).

Useful Link:

No comments:

Post a Comment