I studied some code that uses flags /fp:preciseand /fp:fast.
According to the MSDN documentation for /fp:precise:
C / fp: accurate on x86 processors, the compiler will round up variables like float to the correct precision for assignments and casts and when passing parameters to a function. This rounding ensures that the data does not have any value exceeding the capacity of its type. A program compiled with / fp: exact can be slower and larger than one compiled without / fp: exact. / Fp: explicit ban on use; instead, standard routine library routines are used. See / Oi (Generating Internal Functions) for more information.
Considering a call parsing sqrtf(called with /arch:SSE2, target platform x86/Win32):
0033185D cvtss2sd xmm0,xmm1
00331861 call __libm_sse2_sqrt_precise (0333370h)
00331866 cvtsd2ss xmm0,xmm0
From this question . I believe that modern x86 / x64 processors do not use 80-bit registers (or at least prevent their use), so the compiler does what I would like to assume that this is the best thing and do calculations with 64-bit doublings. And since intrinsics functions are disabled, there is a call to the sqrtf library function.
Well, rightly this seems to be consistent with what the documentation says.
However, when I compile for the x64 arch, something strange happens:
000000013F2B199E movups xmm0,xmm1
000000013F2B19A1 sqrtps xmm1,xmm1
000000013F2B19A4 movups xmmword ptr [rcx+rax],xmm1
Computations are not performed with 64-bit doubles, but built-in functions are used. As far as I can tell, the results are exactly the same as if the flag was used /fp:fast.
Why is there a mismatch between them? /fp:precisejust doesn't work with x64 platform?
, , VS2010 x86 /fp:precise /arch:SSE2. , sqrtpd!
00AF14C7 cvtps2pd xmm0,xmm0
00AF14CA sqrtsd xmm0,xmm0
00AF14CE cvtpd2ps xmm0,xmm0
? VS2010 , VS2012 ?
VS2010, x64, , VS2012 (/fp:precise, , ).
VS, .
64- Windows 7 Intel i5-m430.