Tuesday, February 7, 2012

Sort an array based on frequency

We have an array of known length. The elements in array can be repetitive.
now sort the array based on frequency of occurrence of each element in
array.
Eg: a= {4.3.2.5.4.6.2.6}
after sorting a={4,4,2,2,6,6,3,5} 
4,2,6 all occurs twice, in this case retain the order in which they
appeared in original array. 


http://www.geeksforgeeks.org/sort-elements-by-frequency/
http://www.geeksforgeeks.org/sort-elements-by-frequency-set-2/

METHOD 1 (Use Sorting)
  1) Use a sorting algorithm to sort the elements O(nlogn) 
 Input 5  2  2  8  5  6  8  8

  After sorting we get
  Element 2 2 5 5 6 8 8 8
  Index   1 2 0 4 5 3 6 7
   
  2) Scan the sorted array and construct a 2D array of indexes (instead of elements) and count O(n).
Now construct the 2D array as
  Index, Count
  1,      2
  0,      2
  5,      1
  3,      3
3) Sort the 2D array according to count O(nlogn).
Sort by count (consider indexes in case of tie)
  3, 3
  0, 2
  1, 2
  5, 1
  
  Print the elements using indexes in the above 2D array.
Reason for choosing 2d array of indexes and counts:
There is one issue with element approach.If we modify the input to 5 2 2 8 5 6 8 8, then we should get 8 8 8 5 5 2 2 6 and not 8 8 8 2 2 5 5 6 as will be the case.To handle this, we should use indexes in step 3, if two counts are same then we should first process(or print) the element with lower index. In step 1, we should store the indexes instead of elements.

Example:
  Input 2 5 2 8 5 6 8 8

  After sorting we get
  2 2 5 5 6 8 8 8

  Now construct the 2D array as
  2, 2
  5, 2
  6, 1
  8, 3

  Sort by count
  8, 3
  2, 2
  5, 2
  6, 1


METHOD 2--(Use BST and Sorting)
1. Insert elements in BST one by one and if an element is already present then increment the count of the node. Node of the Binary Search Tree (used in this approach) will be as follows.
struct tree
{
  int element;
  int first_index /*To handle ties in counts*/
  int count;
}BST;
2.Store the first indexes and corresponding counts of BST in a 2D array.
3 Sort the 2D array according to counts (and use indexes in case of tie).
Time Complexity: O(nlogn) if a Self Balancing Binary Search Tree is used.

METHOD 3--(Use Hashing and Sorting)
Using a hashing mechanism, we can store the elements (also first index) and their counts in a hash. Finally, sort the hash elements according to their counts.


That means:

for each x in input:
  if x in table:
    ++table[x]
  else
    table.insert(x)
Then, a simple Quicksort on the unique values, with the comparison function taking into account the frequency instead of the value itself.

That is:

function freq_compare (x, y):
  return compare(table[x], table[y])
Where compare is numeric for the frequencies.


Alternatively,

Build a hashmap with the array value as a key mapping to a struct
which contains the key, frequency, and location of first occurance.
Then sort the hashed elements comparing first by frequency and
breaking ties based on first occurance. Then iterate through the
sorted elements and fill in the array. All steps are O(n) except for
the sort, so the overall complexity is O(n*log n).

No comments:

Post a Comment