Clang can easily do this optimization and even embed a function call. This can be seen from the generated assembly:
Dump of assembler code for function main():
0x0000000000400500 <+0>: push %rbp
0x0000000000400501 <+1>: mov %rsp,%rbp
0x0000000000400504 <+4>: mov $0x40060c,%edi
0x0000000000400509 <+9>: xor %al,%al
0x000000000040050b <+11>: callq 0x4003f0 <printf@plt>
0x0000000000400510 <+16>: xor %eax,%eax
0x0000000000400512 <+18>: pop %rbp
0x0000000000400513 <+19>: retq
I took the liberty of replacing with std::cout << …equivalent calls to printf, as this greatly reduces the mess of disassembly.
GCC 4.6 , vtable , :
Dump of assembler code for function main():
0x0000000000400560 <+0>: sub $0x18,%rsp
0x0000000000400564 <+4>: mov %rsp,%rdi
0x0000000000400567 <+7>: movq $0x4007c0,(%rsp)
0x000000000040056f <+15>: callq 0x400680 <B::f()>
0x0000000000400574 <+20>: xor %eax,%eax
0x0000000000400576 <+22>: add $0x18,%rsp
0x000000000040057a <+26>: retq