BvCase Insensitive Regular Expression Replacement from Dictionary

Question

BvCase Insensitive Regular Expression Replacement from Dictionary

Sorry, but I could not find a working solution from any solutions that Google gave me (a couple of “recipes” on some site were pretty close, but they were old, and I don’t know, t found something that gives me the result I'm looking for.

I rename the files, so I have a function that spits out the file name, for this I just use test_string: So, all dots, (and underscores) and so on are deleted first - since these are the most common things that all these professors do differently, and does it all without problems (or watching) without deleting. 5 Examples:

test_string_1 = 'legal.studies.131.race.relations.in.the.United.States.'

'legal.studies' → "Legal Studies"

test_string_2 = 'mediastudies the triumph of bluray over hddvd'

'mediastudies' →' Media Studies', 'bluray' → 'Blu-ray,' hddvd '->' HD DVD '

test_string_3 = 'computer Science Microsoft vs unix'

'computer science' → 'computer science', 'unix' → 'UNIX'

test_string_4 = 'Perception - metamers dts'

"Perception" would already be nice (but who cares), the big picture is that they want to store audio information there, so "dts" → DTS

test_string_5 = 'Perception - Cue Integration - flashing dot example aac20 xvid'

'aac20' → 'AAC2.0', 'xvid' → 'XviD'

I am currently running this through something like:

new_string = re.sub(r'(?i)Legal(\s|-|)Studies', 'Legal Studies', re.sub(r'(?i)Sociology', 'Sociology', re.sub(r'(?i)Media(\s|-|)Studies', 'Media Studies', re.sub(r'(?i)UNIX', 'UNIX', re.sub(r'(?i)Blu(\s|-|)ray', 'Blu-ray', re.sub(r'(?i)HD(\s|-|)DVD', 'HD DVD', re.sub(r'(?i)xvid(\s|-|)', 'XviD', re.sub(r'(?i)aac(\s|-|)2(\s|-|\.|)0', 'AAC2.0', re.sub(r'(?i)dts', 'DTS', re.sub(r'\.', r' ', original_string.title()))))))))))

I crushed them all in one line; because I do not change / update it and (how my brain / ADD works), the easier it is to have it as little as possible / out of the way, while I do other things as soon as I do not bother with this part anymore.

So with my example:

new_test_string_1 = 'Legal Studies 131 Race Relations In The United States'
new_test_string_2 = 'Media Studies The Triumph Of Blu-ray Over HD DVD'
new_test_string_3 = 'Computer Science Microsoft Vs UNIX'
new_test_string_4 = 'Perception - Metamers DTS'
new_test_string_5 = 'Perception - Cue Integration - Flashing Dot Example AAC2.0 XviD'

, , , - . - , d , (, //whatevers , , , , ). , //.

: , ( , , , , , , ).

, , (, ).

P.S. . /, , , Blu-ray, HD DVD, DTS, AAC2.0, XviD ..

+3

python regex recursion

Robin Hood 22 . '12 4:25

2

>>> import re
>>> def string_fix(text,substitutions):
        text_no_dots = text.replace('.',' ').strip()
        for key,substitution in substitutions.items():
            text_no_dots = re.sub(key,substitution,text_no_dots,flags=re.IGNORECASE)
        return text_no_dots

>>> teststring = 'legal.studies.131.race.relations.in.the.U.S.'
>>> d = {
     r'Legal(\s|-|)Studies' : 'Legal Studies', 
     r'Sociology'           : 'Sociology', 
     r'Media(\s|-|)Studies' : 'Media Studies'
}
>>> string_fix(teststring,d)
'Legal Studies 131 race relations in the U S'

>>> teststring = 'legal.studies.131.race.relations.in.the.U.S.'
>>> def repl(match):
        return ' '.join(re.findall('\w+',match.group())).title()

>>> re.sub(r'Legal(\s|-|)Studies|Sociology|Media(\s|-|)Studies',repl,teststring.replace('.',' ').strip(),flags=re.IGNORECASE)
'Legal Studies 131 race relations in the U S'

+2

jamylak 22 . '12 6:07

Jack · Accepted Answer · 2012-04-22T21:57:12+0000

import re

def string_fix(filename, dict):
    filename = filename.replace('.', ' ')
    for key, val in dict.items():
        filename = re.sub(key, val, filename, flags=re.IGNORECASE)
    return filename

dict = {
         r'Legal[\s\-_]?Studies' : 'Legal Studies',
         r'Media[\s\-_]?Studies' : 'Media Studies',
         r'dts' : 'DTS',
         r'hd[\s\-_]?dvd': 'HD DVD',
         r'blu[\s\-_]?ray' : 'Blu-ray',
         r'unix' : 'UNIX',
         r'aac[\s\-_]?2[\.]?0' : 'AAC2.0',
         r'xvid' : 'XviD',
         r'computer[\s\-_]?science' : 'Computer Science'
     }

string_1 = 'legal.studies.131.race.relations.in.the.United.States.'
string_2 = 'mediastudies the triumph of bluray over hddvd'
string_3 = 'computer Science Microsoft vs unix'
string_4 = 'Perception - metamers dts'
string_5 = 'Perception - Cue Integration - flashing dot example aac20 xvid'

print(string_fix(string_1, dict))
print(string_fix(string_2, dict))
print(string_fix(string_3, dict))
print(string_fix(string_4, dict))
print(string_fix(string_5, dict))

BvCase Insensitive Regular Expression Replacement from Dictionary

More articles: