C #: XmlTextReader screen from random Unicode character

In C #, I have an XmlTextReader created directly from an HTTP response (I have no control over the XML content of the response).

HttpWebResponse response = (HttpWebResponse)request.GetResponse();
XmlTextReader reader = new XmlTextReader(response.GetResponseStream());

It works, but sometimes one of the nodes of the XML elements will contain a Unicode character (for example, "é"), which disables the reader. I tried using StreamReader with the declared encoding, but now XmlTextReader terminates in the very first line: "Data invalid. Line 1, position 1":

StreamReader sReader = new StreamReader(response.GetResponseStream(), System.Text.Encoding.Unicode);
XmlTextReader reader = new XmlTextReader(sReader);

Is there any way to fix this? Also, is there a way to prevent the XmlTextReader from parsing an element (I know its name) with a potentially offensive character? I don’t care about this particular element, I just don’t want it to turn off the reader.

EDIT: Quick fix: read the answer in StringBuilder ("sb"):

sb.Replace("é", "e");
StringReader strReader = new StringReader(sb.ToString());
XmlTextReader reader = new XmlTextReader(strReader);
+3
source share
2 answers

This is not a Unicode character, this is an invalid character (incorrectly encoded).

Unable to protect XmlTextReaderagainst invalid XML . You need either

  • Correct the server side for the correct character encoding
  • Pre-process the text to do it yourself

According to UTF8, all such characters ("é") are encoded using 2 or 3 bytes (or more). You can use the hex editor to check it.

+2
source

" "? - XML , (. XML), .

XML , - XML (, XML , )... , .

+1

All Articles