Using parallelism in Java makes the program slower (four times slower !!!)

I am writing an implementation of the conjugate gradient method.

I am using Java multithreading to reverse matrix substitution. Synchronization is performed using CyclicBarrier, CountDownLatch.

Why does it take so long to synchronize threads? Are there any other ways to do this?

code snippet

private void syncThreads() {

    // barrier.await();

    try {

        barrier.await();

    } catch (InterruptedException e) {

    } catch (BrokenBarrierException e) {

    }

}
+3
source share
5 answers

How many threads are used in total? This is probably the source of your problem. Using multiple threads will only really improve performance if:

  • . , -. .
  • . 4 4 , 4 ( 4 ).

, , , , . , , 10 , 2 , , , , . , , /. , . .

+6

, , , .

, .

final double[] results = new double[10*1000*1000];
{
    long start = System.nanoTime();
    // using a plain loop.
    for(int i=0;i<results.length;i++) {
        results[i] = (double) i * i;
    }
    long time = System.nanoTime() - start;
    System.out.printf("With one thread it took %.1f ns per square%n", (double) time / results.length);
}
{
    ExecutorService ex = Executors.newFixedThreadPool(4);
    long start = System.nanoTime();
    // using a plain loop.
    for(int i=0;i<results.length;i++) {
        final int i2 = i;
        ex.execute(new Runnable() {
            @Override
            public void run() {
                results[i2] = i2 * i2;

            }
        });
    }
    ex.shutdown();
    ex.awaitTermination(1, TimeUnit.MINUTES);
    long time = System.nanoTime() - start;
    System.out.printf("With four threads it took %.1f ns per square%n", (double) time / results.length);
}

With one thread it took 1.4 ns per square
With four threads it took 715.6 ns per square

.

final double[] results = new double[10 * 1000 * 1000];
{
    long start = System.nanoTime();
    // using a plain loop.
    for (int i = 0; i < results.length; i++) {
        results[i] = Math.pow(i, 1.5);
    }
    long time = System.nanoTime() - start;
    System.out.printf("With one thread it took %.1f ns per pow 1.5%n", (double) time / results.length);
}
{
    int threads = 4;
    ExecutorService ex = Executors.newFixedThreadPool(threads);
    long start = System.nanoTime();
    int blockSize = results.length / threads;
    // using a plain loop.
    for (int i = 0; i < threads; i++) {
        final int istart = i * blockSize;
        final int iend = (i + 1) * blockSize;
        ex.execute(new Runnable() {
            @Override
            public void run() {
                for (int i = istart; i < iend; i++)
                    results[i] = Math.pow(i, 1.5);
            }
        });
    }
    ex.shutdown();
    ex.awaitTermination(1, TimeUnit.MINUTES);
    long time = System.nanoTime() - start;
    System.out.printf("With four threads it took %.1f ns per pow 1.5%n", (double) time / results.length);
}

With one thread it took 287.6 ns per pow 1.5
With four threads it took 77.3 ns per pow 1.5

4- .

+6

, , , fork/join JDK 7 , ?

, . , , , .

+1

You are most likely aware of this, but in case you do not, read the Amdahl Act . This provides a link between the expected acceleration of the program using parallelism and consecutive program segments.

+1
source

kernel synchronization is much slower than in a single shell environment, see if you can limit jvm to 1 core (see this blog post )

or you can use ExecuterorService and use invokeAll to do parallel tasks

0
source

All Articles