I'm currently trying to make my own mathematical vector library in C ++, and I'm interested in optimizing it with SSE. For my vec2 and vec3 data types, I cannot store the __m128 type directly, since they should be their expected sizes, but what about vec4? Suppose my vec4 type looks something like this (ignoring the 16-byte alignment requirement for ease of discussion):
union vec4 {
struct {float x, y, z, w;};
__m128 sse;
}
vec4 operator+(const vec4& left, const vec4& right) {
vec4 result;
result.sse = _mm_add_ps(left.sse, right.sse);
return result;
}
Is this the suggested way to do this, or is there some big reason I can't think of? Ie should I do this instead:
struct vec4 {
float x, y, z, w;
};
vec4 operator+(const vec4& left, const vec4& right) {
__m128 leftSSE = _mm_load_ps(reinterpret_cast<const float*>(&left));
__m128 rightSSE = _mm_load_ps(reinterpret_cast<const float*>(&right));
__m128 resultSSE = _mm_add_ps(leftSSE, rightSSE);
vec4 result;
_mm_store_ps(reinterpret_cast<float*>(&result), resultSSE);
return result;
}
, vec2 vec3? vec4, SIMD ?