About gcc-compiled x86_64 code and C code optimization

I compiled the following C code:

typedef struct {
    long x, y, z;
} Foo;

long Bar(Foo *f, long i)
{
    return f[i].x + f[i].y + f[i].z;
}

with the team gcc -S -O3 test.c. Here is the output Bar function:

    .section    __TEXT,__text,regular,pure_instructions
    .globl  _Bar
    .align  4, 0x90
_Bar:
Leh_func_begin1:
    pushq   %rbp
Ltmp0:
    movq    %rsp, %rbp
Ltmp1:
    leaq    (%rsi,%rsi,2), %rcx
    movq    8(%rdi,%rcx,8), %rax
    addq    (%rdi,%rcx,8), %rax
    addq    16(%rdi,%rcx,8), %rax
    popq    %rbp
    ret
Leh_func_end1:

I have a few questions about this assembler:

  • What is the purpose of " pushq %rbp", " movq %rsp, %rbp" and " popq %rbp" if no functions are used in the body rbp, nor rsp?
  • Why do rsithey rdiautomatically contain arguments to the C function ( iand, faccordingly), without reading them from the stack?
  • Foo 88 (11 long s), leaq imulq. , "" , ( )? leaq :

    imulq   $88, %rsi, %rcx
    
+5
3
  • . . , , - , , . "" . , @ouah .

  • , , AMD64 ABI , .

    INTEGER, % rdi,% rsi,% rdx,% rcx,% r8 % r9.

    20, AMD64 ABI Draft 0.99.5 - 3 2010 .

  • , - , . 24 , f - , , i - , , i*24, 24 lea SIB-. lea i*3, i*3 8, , ((%rdi,%rcx,8). 8(%rdi,%rcx,8) 16(%rdi,%rcx,8)). 88 , lea . , imull i*88, , , lea - .

+7
  • pushq% rbp, movq% rsp,% rbp popq% rbp, rbp rsp ?

. -fomit-frame-pointer ( , -O3, gcc ).

+2
3. I tried increasing the size of Foo to 88 bytes (11 longs) and the leaq instruction became an imulq. Would it make sense to design my structs to have "rounder" sizes to avoid the multiply instructions (in order to optimize array access)?

leaq ( cae) k * a + b, "k" 1, 2, 4 8, "a" "b" - . "a" "b" , 1, 2, 3, 4, 5, 8 9 .

Larger structures, such as 16 lengths, can be optimized by calculating the offset with "k" and doubling, but I do not know if this will be what the compiler actually does; you will have to test.

0
source

All Articles