How to remove invalid XML characters from a document using PHP

I am trying to create an XML document that is about 23-30 MB, when I open it with Firefox, I get

XML Parsing Error: not well-formed
Location: file:///Users/User/Downloads/export(2).xml
Line Number 137725, Column 1343:

After that, I try to check the document XML Nannyand get the following error:

Invalid Character (Unicode: 0xB)

On several (13) lines: 137725, 137738, 137751, 137764, 137777, 137790, 137803, 137816, 146834, 189949, 193444, 193457, 193470.

I tried several “solutions” that include:

  • Regular expression:

    preg_replace(
      '/[^\x9\xA\xD\x20-\x{D7FF}\x{E000}-\x{FFFD}\x{10000}-\x{10FFFF}]+/'
      , ' ', $data->Description);
    

    The problem is that I'm not quite sure if this is a valid RegEx, because I am getting an internal server error due to the included mod protection in our apache.

  • I tried to save the file in UTF-8 using the spec, but it was desperate to try

  • I tried to use iconv with 'UTF-8 // IGNORE', but that didn't help

  • , , 230 ., , . max_execution_time php script .

- , , script , .

+3
2

, XML Nanny:

Invalid Character (Unicode: 0xB) (several lines)

0xB , XML . :

$xml = strtr($xml, array("\x0B" => ""));

Firefox .

+2

-. , , base64encode XML, . , . , ?

+1

All Articles