I'm new to SIMD programming, and I have some basic questions that I seem to be unable to understand after studying the topic for several days.
the code I want to optimize is basically a simple but big arithmetic formula, it should be pretty simple to parse coude automatically to calculate independent multiplications / additions in parallel, but I read that autovectorization only works for loops.
I read several times when access to individual elements in a vector through a connection or in some other way should be avoided at all costs, instead it should be replaced with _mm_shuffle_pd (I only work with doubles) ...
I know this is a very simple question, but I don't seem to understand how I can store the contents of the __m128d vector as doubles without referring to it as a union. also, does such an operation perform the performance of any gain compared to the scalar code?
union {
__m128d v;
double d[2];
} vec;
union {
__m128d v;
double d[2];
} vec2;
vec.v = index1;
vec2.v = index2;
temp1 = _mm_mul_pd(temp1, _mm_set_pd(bvec[vec.d[1]], bvec[vec2[1]]));
also, the two unions look ridiculously ugly, but when used
union dvec {
__m128d v;
double d[2];
} vec;
trying to declare indexX as dvec, the compiler complained that dvec was not declared
source
share