At school, we had the task of creating a multi-threaded application. We decided to do a multithreaded merge sort implementation.
However, we cannot make it work faster than the serial implementation.
I have already tried the following:
- with unlimited threads (code example 1) (was very slow)
- limited thread implementation (code example 2) (4 max threads - still very slow)
- using Parallel.Invoke (sample code 3) (even slower)
- complex implementation also with parallel merge function (just shamefully slow)
When I use the analysis tools in Visual Studio (part of Instrumentation), I found timings for the called functions, and the streaming solution is always extremely slower than a sequential implementation.
I see no possible reason for this.
(for example: with 500,000 numbers for sorting, sequential implementation: 16.717.17; parallel: 20.259.97; results with only one additional thread)
I tested it on both machines that I have:
- Intel Core 2 Quad Q9450 @ 2.66Ghz
- Intel Core i7 Q720 @ 1.60Ghz
I can’t figure out in my life how this is possible, will this not speed up the process?
I would be very grateful if someone could help me.
Code Example 1:
ParallelMerge pMerge = new ParallelMerge(T, p1, q1 -1, p2, q2-1, A, p3);
Thread thread = new Thread(new ThreadStart(pMerge.parallel_merge));
thread.Start();
ParallelMerge pMerge2 = new ParallelMerge(T, q1 + 1, r1, q2, r2, A, q3 + 1);
pMerge2.parallel_merge();
thread.Join();
Code Example 2:
if(depthRemaining > 0)
{
ParallelMerge pMerge = new ParallelMerge(T, p1, q1 -1, p2, q2-1, A, p3);
thread thread = new Thread(new ThreadStart(pMerge.parallel_merge));
thread.Start();
ParallelMerge pMerge2 = new ParallelMerge(T, q1 + 1, r1, q2, r2, A, q3 + 1);
pMerge2.parallel_merge();
thread.Join();
}
else
{
ParallelMerge pMerge = new ParallelMerge(T, p1, q1 -1, p2, q2-1, A, p3);
pMerge.parallel_merge();
ParallelMerge pMerge2 = new ParallelMerge(T, q1 + 1, r1, q2, r2, A, q3 + 1);
pMerge.parallel_merge();
}
Code Example 3:
if (depthRemaining > 0)
{
Parallel.Invoke(
() => threaded_merge_sort(getallen, p, q, depthRemaining-1));
threaded_merge_sort(getallen, q + 1, r, 0);
}
else
{
threaded_merge_sort(getallen, p, q, 0);
threaded_merge_sort(getallen, q+1, r, 0);
}