Sunday, March 25, 2012

Searching a string in a set of strings

Given a string, search it in a set of strings (say among 1000s of string). What data structure would you use to store those 1000 strings and get the results fastest?

Variation:
Search a Word in a 2D Grid of characters
Given a 2D grid of characters and a word, find all occurrences of given word in grid. A word can be matched in all 8 directions at any point. Word is said be found in a direction if all characters match in this direction (not in zig-zag form).

The 8 directions are, Horizontally Left, Horizontally Right, Vertically Up and 4 Diagonal directions.
http://www.geeksforgeeks.org/search-a-word-in-a-2d-grid-of-characters/
http://www.geeksforgeeks.org/find-all-occurrences-of-the-word-in-a-matrix/

Strategy:
Solution 1 : Simple, but not best solution
Create a hashtable, and store all the strings in that hashtable. For the given string, search in that hashtable. In the worst case, if we get same hashcode for all the strings, then we have to compare with all the strings.

Solution 2 : Best solution
Create a Trie structure, and search for the string in that trie. Depending upon how we construct the trie, the complexity varies from O(n) to O(n*m) where n is the no.of characters in the given string, and m is the no.of different characters that we have in all the strings.

In the trie, for each node, if we create children for all the possible characters, then the complexity will be O(n), where n is the no.of characters in the given string. The memory requirement will be high in this case.

If we don't create children for all the possible characters, and create children only for the strings that exist in our set, then at each level, we have to search among the children to find the string. So, the complexity would be O(n*m) where n is the no.of characters in the given string, and m is the no.of different characters that we have in all the strings. In this case, we will not use extra memory. 

No comments:

Post a Comment