Fix encoding of incoherently encoded text file

I have a long text file that uses apparently different encodings in subsequent blocks of text (iso or utf-8). This is the result of adding text using >> file.biband copying and pasting from various sources (web pages).

Blocks can be distinguished in principle, since they are bibtexrecords

 @article{key, author={lastname, firstname}, ...}

I would like to convert it to a utf-8 coherent file, as it seems to break my bibtex viewer (kbibtex). I know what I can use iconvto convert the encoding of entire files, but I would like to know if there is a way to fix my file without damaging some of the records.

+5
source share
2

:

#!/usr/bin/perl
use Encode;
while(<>) {
      my $line;
      eval {
        $line=Encode::decode_utf8( $_ );
      }
      if ($@) $line=Encode::decode( 'iso-8859-1', $_ ); #not UTF-8
      # Now $line is UNICODE.Do something to it

} 

, , , . , , .

+3

vim , , , .

  • (shift + v) , .

  • :! enca -L lang - ( "lang" , "enca -L cs". enca )

  • u ( , )

  • , :! iconv -f defined_encoding -t UTF-8

, vim : to:\<, > , .

+3

All Articles