SSE _mm_load_ps causing segmentation errors

So I had problems with this toy example to learn how to program using the built-in SSE. I read other topics here that sometimes segmentation errors using the _mm_load_ps function are caused by misalignment, but I think it should be resolved by the attribute(( aligned (16))) item that I did. Also, when I comment on line 23 or 24 (or both) in my code, the problem disappears, but obviously this makes the code inoperative.

#include <iostream>
using namespace std;

int main()
{
        float temp1[] __attribute__((__aligned__(16))) = {1.1,1.2,1.3,14.5,3.1,5.2,2.3,3.4};
        float temp2[] __attribute__((__aligned__(16))) = {1.2,2.3,3.4,3.5,1.2,2.3,4.2,2.2};
        float temp3[8];
        __m128 m, *m_result;
        __m128 arr1 = _mm_load_ps(temp1);
        __m128 arr2 = _mm_load_ps(temp2);

        m = _mm_mul_ps(arr1, arr2);
        *m_result = _mm_add_ps(m, m); 
        _mm_store_ps(temp3, *m_result); 
        for(int i = 0; i < 4; i++)
        {   
            cout << temp3[i] << endl;
        }   

        m_result++;
        arr1 = _mm_load_ps(temp1+4);
        arr2 = _mm_load_ps(temp2+4);
        m = _mm_mul_ps(arr1, arr2);
        *m_result = _mm_add_ps(m,m);
        _mm_store_ps(temp3, *m_result); 


        for(int i = 0; i < 4; i++)
        {   
            cout << temp3[i] << endl;
        }   
        return 0;
}

Line 23 - arr1 = _mm_load_ps (temp1 + 4). It’s strange that I can do one or the other, but not both. Any help would be appreciated, thanks!

+3
source share
2

, __m128 *m_result, . m_result++, , . .

#include <xmmintrin.h>                 // SSE
#include <iostream>
using namespace std;

int main()
{
        float temp1[] __attribute__((__aligned__(16))) = {1.1,1.2,1.3,14.5,3.1,5.2,2.3,3.4};
        float temp2[] __attribute__((__aligned__(16))) = {1.2,2.3,3.4,3.5,1.2,2.3,4.2,2.2};
        float temp3[8];
        __m128 m, m_result;
        __m128 arr1 = _mm_load_ps(temp1);
        __m128 arr2 = _mm_load_ps(temp2);

        m = _mm_mul_ps(arr1, arr2);
        m_result = _mm_add_ps(m, m); 
        _mm_store_ps(temp3, m_result); 
        for(int i = 0; i < 4; i++)
        {   
            cout << temp3[i] << endl;
        }   

        arr1 = _mm_load_ps(temp1+4);
        arr2 = _mm_load_ps(temp2+4);
        m = _mm_mul_ps(arr1, arr2);
        m_result = _mm_add_ps(m,m);
        _mm_store_ps(temp3, m_result); 


        for(int i = 0; i < 4; i++)
        {   
            cout << temp3[i] << endl;
        }   
        return 0;
}
+5

(1) m_result - :

     __m128 m, *m_result;

*m_result m_result m_result++;. (m_result - , temp3).

(2) , temp3 - :

    float temp3[8];

    float temp3[8] __attribute__((__aligned__(16)));

_mm_storeu_ps:

    _mm_storeu_ps(temp3, m_result); 
            ^^^
+3

All Articles