C # Encoding.UTF8 messed up bytes []

I ran into a very strange problem in which I have a byte [], and when I pass this to the Convert.UTF8.GetString (byte [] bytes) method, the system encoding messes with my bytes and replaces only a few special bytes (which I use as markers on my system) up to three string char representations.

[0] 70  byte
[1] 49  byte
[2] 45  byte
[3] 86  byte
[4] 49  byte
[5] 253 byte     <-- Special byte
[6] 70  byte
[7] 49  byte
[8] 45  byte
[9] 86  byte
[10]50  byte
[11]253 byte     <-- Special byte
[12]70  byte
[13]49  byte
[14]45  byte
[15]86  byte
[16]51  byte

When I go above byte [] to the Encoding.UTF8.GetString (bytes) method, I get the following output:

private Encoding _encoding = System.Text.Encoding.GetEncoding("UTF-8", new EncoderReplacementFallback("?"), new DecoderReplacementFallback("?"));       
_encoding.GetString(bytes)  "F1-V1 F1-V2 F1-V3" string

The actual value should not be "", as this means that it could not encode and replace these special bytes with "". In any case, I can get around this, for example, convert to a string and save the special byte representation in a single char.

, ;

byte AM = (byte) 254
byte VM = (byte) 253
byte SM = (byte) 252 

.

,

-

Sheeraz

+3
2

UTF-8 , , , , UTF-8 , .. byte[], , 3 (70,49,45,86,49; 70,49,45,86,50, 70,59,45,86,51), 3 . UTF-8 , UTF-8.

, ; , , ,

  • , .
  • ,

, "varint", :

05,70,49,45,86,49,05,70,49,45,86,50,05,70,59,45,86,51

05 - "varint", 5 ; , :

// pseude code
while(!EOF) {
    int len = ReadVarint();
    var blob = ReadBytes(len);
    string s = Utf8Decode(blob);
    // ...
}
+2

UTF-8, UTF-8 .

, , UTF-8, Encoding.GetString, , ( , , , , , ).

"" UTF-8 . ( , 1 , ), , , .

+7

All Articles