Fix encoding of incoherently encoded text file

Question

Fix encoding of incoherently encoded text file

I have a long text file that uses apparently different encodings in subsequent blocks of text (iso or utf-8). This is the result of adding text using >> file.biband copying and pasting from various sources (web pages).

Blocks can be distinguished in principle, since they are bibtexrecords

 @article{key, author={lastname, firstname}, ...}

I would like to convert it to a utf-8 coherent file, as it seems to break my bibtex viewer (kbibtex). I know what I can use iconvto convert the encoding of entire files, but I would like to know if there is a way to fix my file without damaging some of the records.

+5

linux perl character-encoding iconv bibtex

highsciguy May 21 '12 at 14:44

source share

2

Alien Life Form · Answer 1 · 2012-05-21T16:28:34+0000

:

#!/usr/bin/perl
use Encode;
while(<>) {
      my $line;
      eval {
        $line=Encode::decode_utf8( $_ );
      }
      if ($@) $line=Encode::decode( 'iso-8859-1', $_ ); #not UTF-8
      # Now $line is UNICODE.Do something to it

}

, , , . , , .

exa · Answer 2 · 2012-05-21T20:00:02+0000

vim , , , .

(shift + v) , .
:! enca -L lang - ( "lang" , "enca -L cs". enca )
u ( , )
, :! iconv -f defined_encoding -t UTF-8

, vim : to:\<, > , .

Fix encoding of incoherently encoded text file

More articles: