C ++ 11: An example of the difference between a regular string literal and a UTF-8 string literal?

A string literal that does not start with an encoding prefix is ​​literally a normal string and is initialized with the given characters.

A string literal starting with u8, such as u8 "asdf", is a UTF-8 string literal and is initialized with the specified UTF-8 encoded characters.

I do not understand the difference between a regular string literal and a UTF-8 string literal.

Can someone give an example of a situation when they are different? (The reason for the different compiler output)

(I mean from the POV standard, not any specific implementation)

Each character set element in a character literal or string literal, as well as each escape sequence and the name of a universal character in a character literal or jagged string literal, are converted to the corresponding member of the execution character set.

+5
source share
1 answer

The C and C ++ languages ​​allow a huge number of latitudes in their implementations. C was written long before UTF-8 became a "way to encode text in single bytes": different systems had different text encodings.

So the byte values ​​for the string in C and C ++ really correspond to the compiler. 'A'- this is what the encoding for the compiler has chosen for the character A, which may not coincide with UTF-8.

++ , UTF-8 . u8"A"[0] ++ UTF-8, , .

, ++ 2 , , UTF-8. "hello world", u8"hello world" .

, man gcc

-fexec- =

, . UTF-8. charset , icon icon.

-finput- =

, , GCC. , GCC , UTF-8. locale, . , . charset , icon icon.

C/++.

+6

All Articles