This is not just specific to SSE (or even x86). On most architectures, the loads and storages must be naturally aligned, otherwise they either (a) throw an exception, or (b) need two or more loops plus some fixes to transparently handle the inconsistent load / storages. In x86 (b), true for data types <16 bytes, but (a) true for SSE data types, unless you explicitly use inconsistent versions of load / store instructions that can handle invalid data.
: / SSE ? , , , () , 2x , Intel, Core i7, , .