vext.8 16- ( 3 ).
intrinsics , , :
#include <arm_neon.h>
uint16x8_t byterotate3(uint16x8_t input) {
uint8x16_t tmp = vreinterpretq_u8_u16(input);
uint8x16_t rotated = vextq_u8(tmp, tmp, 16-3);
return vreinterpretq_u16_u8(rotated);
}
g++5.4 -O3 -march=armv7-a -mfloat-abi=hard -mfpu=neon ( Godbolt) :
byterotate3(__simd128_uint16_t):
vext.8 q0, q0, q0, #13
bx lr
16-3 , 3 . ( , 13 3 , 13).
: x86 , : palignr ( SSSE3).
, - NEON, , OP vext.16 (vextq_u16), 16- . , vext.8, , . vext.8 :
- VEXT
You can specify the data type from 16, 32 or 64 instead of 8. In this case, #imm refers to half-words, words or double words instead of referring to bytes, and the allowed ranges are accordingly reduced.
source
share