About gcc-compiled x86_64 code and C code optimization -
i compiled following c code:
typedef struct { long x, y, z; } foo; long bar(foo *f, long i) { return f[i].x + f[i].y + f[i].z; } with command gcc -s -o3 test.c. here bar function in output:
.section __text,__text,regular,pure_instructions .globl _bar .align 4, 0x90 _bar: leh_func_begin1: pushq %rbp ltmp0: movq %rsp, %rbp ltmp1: leaq (%rsi,%rsi,2), %rcx movq 8(%rdi,%rcx,8), %rax addq (%rdi,%rcx,8), %rax addq 16(%rdi,%rcx,8), %rax popq %rbp ret leh_func_end1: i have few questions assembly code:
- what purpose of "
pushq %rbp", "movq %rsp, %rbp", , "popq %rbp", if neitherrbpnorrspused in body of function? - why
rsi,rdiautomatically contain arguments c function (i,f, respectively) without reading them stack? i tried increasing size of foo 88 bytes (11
longs) ,leaqinstruction becameimulq. make sense design structs have "rounder" sizes avoid multiply instructions (in order optimize array access)?leaqinstruction replaced with:imulq $88, %rsi, %rcx
the function building own stack frame these instructions. there's nothing unusual them. should note, though, due function's small size, inlined when used in code. compiler required produce "normal" version of function, though. also, @ouah said in answer.
this because that's how amd64 abi specifies arguments should passed functions.
if class integer, next available register of sequence %rdi, %rsi, %rdx, %rcx, %r8 , %r9 used.
page 20, amd64 abi draft 0.99.5 – september 3, 2010
this not directly related structure size, rather - absolute address function has access. if size of structure 24 bytes,
faddress of array containing structures, ,iindex @ array has accessed, byte offset each structurei*24. multiplying 24 in case achieved combination oflea, sib addressing. firstleainstruction calculatesi*3, every subsequent instruction usesi*3, multiplies further 8, therefore accessing array @ needed absolute byte offset, , using immediate displacements access individual structure members ((%rdi,%rcx,8).8(%rdi,%rcx,8), ,16(%rdi,%rcx,8)). if make size of structure 88 bytes, there no way of doing such thing swiftly combination oflea, kind of addressing. compiler assumes simpleimullmore efficient in calculatingi*88series of shifts, adds,leas or else.
Comments
Post a Comment