Caching, general understanding

I am trying to understand cache overflow, is the following text correct?

Here is the code below.

long max = 1024*1024;
long a(max), b(max), c(max), d(max), e(max); 
for(i = 1; i < max; i++) 
    a(i) = b(i)*c(i) + d(i)*e(i);

ARM Cortex A9 combines four paths, and each cache line is 32 bytes, the total cache is 32 bytes. There are a total of 1024 cache lines. To perform the above calculation, one cache line must be offset. When you need to calculate (i), b (i) will be thrown. Then, when the loop repeats, b (i) is required and therefore another vector is shifted. In the above example, cache reuse is missing.

To solve this problem, you can indent between vectors to exclude their starting address. Ideally, each addition should be at least the size of a full cache line.

The above problem can be solved as such.

long a(max), pad1(256), b(max), pad2(256), c(max), pad3(256), d(max), pad4(256), e(max) 

.

, .

.

+3
1

8 (1024 * 1024 * 8B, 8B ). , , (i), b (i), c (i), d (i) e (i) ( , 2 ). , . , , d (i) e (i), , , b (i) c (i), .

, , , 32B. . , a (i), b (i), c (i), d (i) e (i) . 4 . , 4 (a (0), a (1), a (2), a (3) , (4), a (5), a (6), a (7)).

,

long a(max),pad1(32),b(max),pad2(32),c(max),pad3(32),d(max),pad4(32),e(max)

why-is-one-loop-so-much-slower-than-two-loops

+1

All Articles