How to convert file names from unicode to ascii

I have a bunch of music files on an NTFS partition mounted on Linux that have filenames with Unicode characters. I am having trouble writing a script to rename files so that all file names use only ASCII characters. I think using a command iconvshould work, but I'm having trouble escaping characters for a command 'mv'.

EDIT: It doesn't matter if there is no live translation for unicode characters. I guess I just replaced them with "?" character.

+4
source share
3 answers

I don't think it iconvhas any means to replace characters. This may help Python:

#!/usr/bin/python
import sys

def unistrip(s):
    if isinstance(s, str):
        s = s.decode('utf-8')
    chars = []
    for i in s:
        if ord(i) > 0x7f:
            chars.append(u'?')
        else:
            chars.append(i)
    return u''.join(chars)

if __name__ == '__main__':
    print unistrip(sys.argv[1])

:

$ ./unistrip.py "yikes_𝄞_oh_look_a_file_火"
yikes_?_oh_look_a_file_?

:

$ mv "yikes_𝄞_oh_look_a_file_火" "`./unistrip.py "yikes_𝄞_oh_look_a_file_"`"

. mv (.. script), , .

+2

mv , inode.

:

$ ls -il

:

13377799 -rw-r--r--  1 draco  draco      11809 Apr 25 01:39 some_filename.ext
9340462  -rw-r--r--  1 draco  draco      81648 Apr 23 02:27 some_strange_filename.ext
9340480  -rw-r--r--  1 draco  draco       4717 Apr 23 03:54 yikes_𝄞_oh_look_a_file_

find, , , Python Thanatos:

$ find . -inum 9340480 -exec ./unistrip.py {} \;

iconv .

, -, [ ].

+3

convmv is a good Perl script for converting file name encodings. But it cannot handle characters that are not in the encoding of the destination.

You can change any non-ASCII character to '?' using the rename utility distributed with Perl:

rename 's/[^ -~]/?/g' *

Unfortunately, this replaces multibyte characters with a few "?". Depending on the Unicode encoding that is used, and the characters associated with the regexp change may help, for example.

rename 's/[^ -~]{2}/?/g' *

for double byte characters.

+2
source

All Articles