How to load / convert one 32-bit floating point to the AVX register 256 ymm so that all 8 floats are from the same floating source?
I used to use the AVX 128 xmm register to load one float into 4 packed floats.
movss xmm7,[eax]; shufps xmm7,xmm7,0; add eax, 0x4;
This operation is sometimes called a "broadcast". We have a bunch of AVX instructions that are doing just that: vbroadcast128, vbroadcastsdand vbroadcastss. Since you want to translate one floating point value with one precision, you want to get the last one:
vbroadcast128
vbroadcastsd
vbroadcastss
vbroadcastss ymm7, [eax]
, , - :
shufps xmm0, xmm0, 0 vinsertf128 ymm0, ymm0, xmm0, 1
, xmm0 . shufps, 0 , dword XMM. vinsertf128 xmmword YMM xmmword.
xmm0
shufps
vinsertf128
, . . , , vbroadcast -.
vbroadcast