Poor performance in a multithreaded C ++ program

I have a C ++ program running on Linux in which a new thread is created to do some expensive work that does not depend on the main thread (the computational work is completed by writing the results to files that are ultimately very large). However, I get relatively poor performance.

If I implement the program simply (without introducing other threads), it completes the task after about 2 hours. For a multithreaded program, it takes about 12 hours to complete the same task (this was checked with only one spawned thread).

I tried a couple of things, including pthread_setaffinity_np , to set the thread to one processor (out of 24 available to the server that I use), and also pthread_setschedparam , to set the scheduling policy (I just tried SCHED_BATCH). But the effect of them so far has been negligible.

Are there common reasons for this kind of problem?

EDIT: I added the sample code that I use, and these are, I hope, the most important parts. The process_job () function is what the computational work actually does, but there would be too much to include here. Basically, it is read in two data files and uses them to execute queries in the memory graph database, in which the results are written to two large files for several hours.

EDIT part 2: just to clarify, the problem is not that I want to use threads to improve the performance of the algorithm that I have. But rather, I want to run many instances of my algorithm at the same time. Therefore, I expect the algorithm to work at the same speed when it is put into the stream, as if I had not used multithreading at all.

EDIT 3: . (, ), . , , - , . , , , - , . , , . , , .

() 4: , ​​ . ( ), , , .

struct sched_param sched_param = {
    sched_get_priority_min(SCHED_BATCH)
};

int set_thread_to_core(const long tid, const int &core_id) {
   cpu_set_t mask;
   CPU_ZERO(&mask);
   CPU_SET(core_id, &mask);
   return pthread_setaffinity_np(tid, sizeof(mask), &mask);
}

void *worker_thread(void *arg) {
   job_data *temp = (job_data *)arg;  // get the information for the task passed in
   ...
   long tid = pthread_self();
   int set_thread = set_thread_to_core(tid, slot_id);  // assume slot_id is 1 (it is in the test case I run)
   sched_get_priority_min(SCHED_BATCH);
   pthread_setschedparam(tid, SCHED_BATCH, &sched_param);
   int success = process_job(...);  // this is where all the work actually happens
   pthread_exit(NULL);
}

int main(int argc, char* argv[]) {
   ...
   pthread_t temp;
   pthread_create(&temp, NULL, worker_thread, (void *) &jobs[i]);  // jobs is a vector of a class type containing information for the task
   ...
   return 0;
}
+5
9

, , , - , " " " . , - , .

- . , :

  • .
  • ( true, "false" )
  • .
  • , / .
  • ...

.

, , , , . , , , . . , :

while (!tryLock(some_some_lock))
{
    tried_locking_failed[lock_id][thread_id]++;
}
total_locks[some_lock]++;

, - " , " - - , , ...

( true, "false" )

[ ] , " " , . "" " -", 32 , - :

int var[NUM_THREADS]; 
...
var[thread_id]++; 

, " " - ACTUAL , 32- , - , .

.

, , . , "", . 2 ^ n ( ) ( ), " " - , 1 2 . , , , .

- , / .

/ , - , , , , , . , , , , .

, , ( ), , .

"" " , ". , , .

, , . , , , ", " , , , , , , " " - , (, 10, 20, 32, 64 ), .

" ". .

, , , , , . , .

+22

" " , , , . , parallelism, , .

+4

, , , .

1 °) - (, ..) ? ( , ). , : , , .

2 °). 24- , , , NUMA ( ). , . /sys/devices/system/cpu/cpuX/, ( , cpu0 cpu1 , , ). , , ( NUMA node , ).

3 °). -. -? -, , .

4 °) . , , . ( ) - , Linux Perf OProfile. , , , , .

+3

, . , , . , , .

, , , :

  • (, , ..), .
  • , .

, , , , :

  • : , ,
  • , : , , . ,
  • : , , , , .
  • : , , , , .
  • NUMA: . , , , , .

. . , , , . :

  • . , , .
  • , process(), , . .
  • , , . , , ( , - ) , , , . :
    • ( )
  • . , , .
  • , , ( - ).

:

  • NUMA .
  • . , , .
+2

, , , .

, , , . , , - , , - . , . , , .

Linux, oprofile gprof.

, , , .

+2

, , , . , , , . ftrace, Linux Solaris dtrace (, , , VxWorks, Greenhill Integrity OS Mercury Computer Systems Inc .)

, : http://www.omappedia.com/wiki/Installing_and_Using_Ftrace, this . , -, OMAP; X86 Linux ( , , ​​ ). , GTKWave VHDL-, "". - , sched_switch, , .

sched_switch, , ( , ) , , . "" .

+1

1 , , , . , , , , . , , , .

, , - , , . , . / , , , OpenMP std:: async ++ 11.

. , rand() , prgn's thread. , , .

, , , volatile, .

0

, . . , , . , .

, , , , , -, (, ), , , , .

0

, ( , script) ? . , MPI OpenMP, . , , , . ..

0

All Articles