C ++ and Java Encodings

I'm trying to make a Java application, and the VS C ++ application exchanges data and sends different messages to each other using Sockets. The only problem that I still have is that I am completely lost in my encodings.

By default, Java uses UTF-8. This applies to Unicode encoding. In my VS project, I have settings set in Unicode. Although for some reason, when I debug my code, I always see that my lines are encoded as CP1252 in memory. Moreover, if I try to use CP1252 in Java, it is great for English letters, but whenever I try to write some Russian letters, I get a byte 3ffor each letter. If on the other hand, I try to use UTF-8 in Java - each English letter is 1 byte long, but each Russian is 2 bytes long. Isn't that a multibyte encoding?

Some C ++ docs say they std::string(char)use the UTF-8 code page and std:wstring(wchar_t)- UTF-16. When I debug my application, I see CP1252 encoding for both of them, although wstring has empty bytes between each letter.

Could you explain how coding behaves in both Java and C ++, and how can I tell my 2 applications?

+3
source share
3 answers

UTF-8 has a variable length for each character. Common characters take up less space using less bytes per character. More unusual characters take up more space because they must be encoded in more bytes. Since most of them were invented in the USA, guess which characters are shorter and longer?

, , . .

+2

, java utf-8. utf8 ( ). ​​

java - utf-16 (. Java String? UTF-8? UTF-16?)

0

Unicode - , UTF-8 UTF-16 - Unicode. ( ASCII) UTF-8 , CP1252, UTF-16 . (), UTF-8, UTF-16 CP1251. .

For example, if you agreed with UTF-8, the following converts the Java string to an array of bytes using UTF-8:

byte [] b = s.getBytes ("UTF-8");

Then:

outputStream.write (b);

will send data to the socket.

0
source

All Articles