C ++ 11: An example of the difference between a regular string literal and a UTF-8 string literal?

Question

C ++ 11: An example of the difference between a regular string literal and a UTF-8 string literal?

A string literal that does not start with an encoding prefix is literally a normal string and is initialized with the given characters.
A string literal starting with u8, such as u8 "asdf", is a UTF-8 string literal and is initialized with the specified UTF-8 encoded characters.

I do not understand the difference between a regular string literal and a UTF-8 string literal.

Can someone give an example of a situation when they are different? (The reason for the different compiler output)

(I mean from the POV standard, not any specific implementation)

Each character set element in a character literal or string literal, as well as each escape sequence and the name of a universal character in a character literal or jagged string literal, are converted to the corresponding member of the execution character set.

+5

c ++ c ++ 11 utf-8 string-literals character-encoding

Andrew Tomazos Feb 04 '13 at 2:42

source share

1 answer

Yakk · Accepted Answer · 2013-02-04T03:06:19+0000

The C and C ++ languages allow a huge number of latitudes in their implementations. C was written long before UTF-8 became a "way to encode text in single bytes": different systems had different text encodings.

So the byte values for the string in C and C ++ really correspond to the compiler. 'A'- this is what the encoding for the compiler has chosen for the character A, which may not coincide with UTF-8.

++ , UTF-8 . u8"A"[0] ++ UTF-8, , .

, ++ 2 , , UTF-8. "hello world", u8"hello world" .

, man gcc

-fexec- =
, . UTF-8. charset , icon icon.
-finput- =
, , GCC. , GCC , UTF-8. locale, . , . charset , icon icon.

C/++.

C ++ 11: An example of the difference between a regular string literal and a UTF-8 string literal?

More articles: