I wrote SSE code to sum byte values. (VS2005.)
As simple as that, it works quite well (and fast). Just crashing with some array sizes. And it crashes only in release mode - never in debugging. Maybe someone sees an “obvious” error? Any help was appreciated.
__int64 Sum (const unsigned char* pData, const unsigned int& nLength)
{
__int64 nSum (0);
__m128i* pp = (__m128i*)pData;
ATLASSERT( ( (DWORD)pp & 15 ) == 0 );
__m128i zero = _mm_setzero_si128(),
a, b, c, d, tmp;
unsigned int i (0);
for ( ; i < nLength; i+=64)
{
a = _mm_sad_epu8( *(pp++), zero);
b = _mm_sad_epu8( *(pp++), zero);
c = _mm_sad_epu8( *(pp++), zero);
d = _mm_sad_epu8( *(pp++), zero);
tmp = _mm_add_epi64( _mm_add_epi64( _mm_add_epi64( a, b ), c ), d);
a = _mm_srli_si128 ( tmp, 8 );
nSum += _mm_cvtsi128_si32( a ) + _mm_cvtsi128_si32( tmp );
}
if (nLength % 64)
for (i -= 64; i < nLength; i++)
nSum += pData [i];
return nSum;
}
The function is called as follows:
unsigned int nLength = 3571653;
unsigned char *pData = (unsigned char*) _aligned_malloc(nLength, 16);
Sum (pData, nLength);
source
share