Should SSE data types be transferred or created for each operation?

I'm currently trying to make my own mathematical vector library in C ++, and I'm interested in optimizing it with SSE. For my vec2 and vec3 data types, I cannot store the __m128 type directly, since they should be their expected sizes, but what about vec4? Suppose my vec4 type looks something like this (ignoring the 16-byte alignment requirement for ease of discussion):

union vec4 {
  struct {float x, y, z, w;};
  __m128 sse;
}

vec4 operator+(const vec4& left, const vec4& right) {
  vec4 result;
  result.sse = _mm_add_ps(left.sse, right.sse);
  return result;
}

Is this the suggested way to do this, or is there some big reason I can't think of? Ie should I do this instead:

struct vec4 {
  float x, y, z, w;
};

vec4 operator+(const vec4& left, const vec4& right) {
  __m128 leftSSE = _mm_load_ps(reinterpret_cast<const float*>(&left));
  __m128 rightSSE = _mm_load_ps(reinterpret_cast<const float*>(&right));
  __m128 resultSSE = _mm_add_ps(leftSSE, rightSSE);
  vec4 result;
  _mm_store_ps(reinterpret_cast<float*>(&result), resultSSE);
  return result;
}

, vec2 vec3? vec4, SIMD ?

+5
2

, , , / /, , , / .

/ , , sse- . / , , .

, , - SSE, /, / . .

vec2/3 vec4 . SSE , , , .

SSE, SoA, SoA-AoS swizzling/ SoA.

.

+6

Visual ++ (, , , - , , ) __m128 defined as :

typedef struct __declspec(intrin_type) __declspec(align(16)) __m128 {
   float m128_f32[4];
} __m128;

, , , 128 , - . , , , _mm_loadu_ps, .

__m128 , , , .

0

All Articles