I found that sometimes it’s faster to split one cycle into two or more
for (i=0; i<AMT; i++) { a[i] += c[i]; b[i] += d[i]; } || \/ for (i=0; i<AMT; i++) { //a[i] += c[i]; b[i] += d[i]; } for (i=0; i<AMT; i++) { a[i] += c[i]; //b[i] += d[i]; }
On my desktop, win7, AMD Phenom (tm) x6 1055T, the two-loop version is faster by about 1/3 of the time less.
But if I am dealing with an appointment,
for (i=0; i<AMT; i++) { b[i] = rand()%100; c[i] = rand()%100; }
dividing the assignment of b and c into two loops is not faster than in one cycle.
I think there are some rules for using the OS to determine if certain codes can be executed by multiple processors.
I want to ask if I guess correctly, and if I am right, what are the rules or cases when several processors will be automatically (without thread programming) used to speed up my programs?
(http://en.wikipedia.org/wiki/Loop_optimization). GCC, http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html .
, rand(), .
, . , SIMD (, Intel SSE), , , - , , , a, b. , .
a
b
"" rand() , , . SIMD, , , . , , , .
rand()
, ; concurrency. , , , , . , .
, , , "" . , b c "" , , , . , b c , , .
c
, , , , ( ), ?
, . .
For some other reason, splitting the first cycle into two makes it faster. Perhaps your compiler is able to generate more efficient code, or the processor has an easier time, after taking the correct data, etc. It is difficult to say without analyzing the generated machine code.