How to convert an xml file that is not in UTF-8 format to xml compatible with UTF-8

I have a huge XML file whose data samples look like this:

 <vendor name="aglaia"><br>
              <vendorOUI oui="000B91" description="Aglaia Gesellschaft für Bildverarbeitung ud Kommunikation m" /><br>
         </vendor><br>
         <vendor name="ag"><br>
              <vendorOUI oui="0024A9" description="Ag Leader Technology" /><br>
         </vendor><br>

since you can see that there is a text "Gesellschaft für Bildverarbeitung" that does not comply with UTF-8, because I get errors from xml validation, errors like:

Import failed:
com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence.

So the query is how to take care of this in a Linux environment in order to convert the XML file to UTF-8 format? or is there a way in bash, so when creating xml in the first place, I can guarantee that all variables / lines are stored in UTF-8 format?

+3
1

:

iconv -f ISO-8859-1 -t UTF-8 filename.txt

. gnu-

... http://standards.ieee.org/develop/regauth/oui/oui.txt "aglia" ( ) :

00-0B-91   (hex)            Aglaia Gesellschaft für Bildverarbeitung und Kommunikation m
000B91     (base 16)        Aglaia Gesellschaft für Bildverarbeitung und Kommunikation m
                            Tiniusstr. 12-15
                            Berlin  D-13089
                            GERMANY

, "ü" - , mangeld.

"oui.txt" wget "ü" . , - . :

  • wget --header='Accept-Charset: utf-8'
  • curl -o oui.txt

, " ". wget script .

script ( BEGIN END, XML )

#!/bin/bash

wget http://standards.ieee.org/develop/regauth/oui/oui.txt
iconv -f iso-8859-15 -t utf-8 oui.txt > converted

awk 'BEGIN {
         print "HTML-header"
     }

     /base 16/ {
         printf("<vendor name=\"%s\">\n", $4)
         read
         desc = substr($0, index($0, $4))
         printf("<vendorOUI oui=\"%s\" description=\"%s\"/>\n", $1, desc)
     }
     END {
         print "HTML-footer"
    }
    ' converted

, !

+3

All Articles