Saturday, March 24, 2012

Advanced Sorting and Searching Algorithms

Given a huge file with 2GB size with one string per line.Which algorithm to be used to sort the file and why?

Concept:
The catch here is "2GB" file. By saying "2GB", the question stresses that we cannot bring the entire data in memory at the same time. Therefore we will have to sort the data in chunks. This can be done by a class of sorting algorithms called external sorting. External sorting works as follows:

Suppose we have 200 MB of available memory,


1) Read 200 MB of data in main memory and sort it by some conventional method like quicksort in O(n log n).


2) Write the sorted data back to a temporary file on disk.


3) Repeat steps 1 and 2 until all the data is in sorted 200 MB chunks (2GB/200MB = 10 chunks). We will now merge these chunks into a single output file.


4) Read the first 15 MB of of each chunk (15*10 = 150 MB total) into input buffer in main memory and allocate remaining 50 MB for output buffer.


5) Perform a 10-way merge on the input buffer and store the result in the output buffer. If the output buffer is full, write it to the final sorted file on the disk and empty it. If any of the 10 input buffers get empty, fill it again with the next 20 MB of the associated 200 MB chunk until no more data from the chunk is available.

No comments:

Post a Comment