Problem:
Lets say we have 20GB size file, which contains a string in each line.
Now we need to sort this file, but the system RAM size is only 1GB.
Solution:
So we can't load the entire file into main memory(RAM) at once and apply sorting algorithm.
We load chunks of data from the input file, sort the chunks and then apply K way merging process.
Since RAM size is 1GB, divide the 20GB file into 25 chunks, where each chunk is of size ~800MB(800MB * 25 ~= 20GB).
Now take each chunk into memory one after another, sort them individually, and push them back to the same locations.
So now the input file contains 25 sorted chunks.
Now take 36MB(900MB/25) from each file. Call this as input chunks.
Leave the rest of the RAM 100MB to store output sorted chunk.
Now compare first lines of each chunk, take out the smallest one and push it on to the output array.
Repeat this process:
If any chunk becomes empty, then take the next data of that input chunk(next 36MB of the sorted 800MB array)
If the output array becomes full, empty it and push it on to the output array of hard disk.
Continue this process till all the chunks data is sorted.
References:
http://exceptional-code.blogspot.in/2011/07/external-sorting-for-sorting-large.html
Lets say we have 20GB size file, which contains a string in each line.
Now we need to sort this file, but the system RAM size is only 1GB.
Solution:
So we can't load the entire file into main memory(RAM) at once and apply sorting algorithm.
We load chunks of data from the input file, sort the chunks and then apply K way merging process.
Since RAM size is 1GB, divide the 20GB file into 25 chunks, where each chunk is of size ~800MB(800MB * 25 ~= 20GB).
Now take each chunk into memory one after another, sort them individually, and push them back to the same locations.
So now the input file contains 25 sorted chunks.
Now take 36MB(900MB/25) from each file. Call this as input chunks.
Leave the rest of the RAM 100MB to store output sorted chunk.
Now compare first lines of each chunk, take out the smallest one and push it on to the output array.
Repeat this process:
If any chunk becomes empty, then take the next data of that input chunk(next 36MB of the sorted 800MB array)
If the output array becomes full, empty it and push it on to the output array of hard disk.
Continue this process till all the chunks data is sorted.
References:
http://exceptional-code.blogspot.in/2011/07/external-sorting-for-sorting-large.html
No comments:
Post a Comment