What encoding should I use to read a line from a file?

Question

What encoding should I use to read a line from a file?

I am parsing a file (which I am not creating) that contains a string. The line is always preceded by 2 bytes, which tell me the length of the next line.

For instance:

05 00 53 70 6F 72 74

:

Sport

Using C # BinaryReader, I read the line using:

string s = new string(binaryReader.ReadChars(size));

Sometimes a weird funky character appears that seems to push the flow position further than it should. For instance:

0D 00 63 6F 6F 6B 20 E2 80 94 20 62 6F 6F 6B

Must be:

cook - book

and although it reads fine, the stream ends two bytes further than it should ?! (Which then messes up the rest of the parsing.)

I guess this has something to do with 0xE2 in the middle, but I'm not quite sure why and how to deal with it.

Any suggestions that were highly appreciated!

+3

string c # binary-data encoding character-encoding

Bridgey May 11, '11 at 21:01

source share

3

05 00 53 70 6F 72 74

0x7F, 7- ASCII. UTF-8 ASCII, 8- , , .

0D 00 63 6F 6F 6B 20 E2 80 94 20 62 6F 6F 6B

, "" 0xE2, 7- ASCII.

0x0D , 11 , 13 .

0xE2 , UTF-8, ( 127), , - (EM Dash).

+1

Jonas Elfström 11 '11 21:13

, E2 . BinaryReader.ReadChars(n) n-, n Unicode UTF-8. . Unicode. , , - . UTF-8 000080 00009F . .

BinaryReader.ReadBytes, .

, BinaryReader, . , UTF-8,

Encoding.UTF8.GetString(byte [] rawData)

.

,

0

Alois Kraus 11 '11 21:19

Ted Hopp · Accepted Answer · 2011-05-11T21:05:33+0000

, UTF-8. 3- E2 80 94 Unicode U + 2014 (EM DASH).

What encoding should I use to read a line from a file?

More articles: