Python: regex for data collection

Question

Python: regex for data collection

I want to ask for your help.

I have most of the data that looks like this:

     a
  b : c 901
   d : e sda
 v
     w : x ads
  any
   abc : def 12132
   ghi : jkl dasf
  mno : pqr fas
   stu : vwx utu

Description: the file starts with a line containing a single word (it can start with spaces, and spaces also after a word), then a line of attributes separated by a colon (also can have spaces), then again a line of attributes or a line with one word. I cannot create the correct regular expression to catch it in this form:

{
  "a": [["b": "c 901"], ["d", "e sda"]],
  "v": [["w", "x ads"]],
  "any": ["abc", "def 12132"], ["ghi", "jkl dasf"],
  # etc.
}

Here is what I tried:

regex = str()
regex += "^(?:(?:\\s*)(.*?)(?:\\s*))$",
regex += "(?:(?:^(?:\\s*)(.*?)(?:\\s*):(?:\\s*)(.*?)(?:\\s*))$)*$"
pattern = re.compile(regex, re.S | re.M)

, . ? , , ":", , ( , , , -, ).

!

P.S. :

a
  b : c 901
  d : e sda

, ( ), ( ":" ), . . , .

+5

python regex

ghostmansd 14 . '13 10:19

3

freakish · Answer 1 · 2013-02-14T10:25:55+0000

? :

result = {}

last = None
for _line in data:
    line = _line.strip( ).split( ":" )
    if len( line ) == 1:
        last = line[ 0 ]
        if last not in result:
            result[ last ] = []
    elif len( line ) == 2:
        obj = [ line[ 0 ].strip( ), line[ 1 ].strip( ) ]
        result[ last ].append( obj )

, .

Anirudha · Answer 2 · 2013-02-14T10:34:30+0000

..

 (?:[\n\r]+|^)\s*(\w+)\s*[\n\r]+(\s*\w+\s*:\s*.*?)(?=[\n\r]+\s*\w+\s*[\n\r]+|$)

regex singleline dotall

1 2 , ,

..use dot all option

root · Answer 3 · 2013-02-14T10:54:59+0000

# a more golf - like solution
from itertools import groupby

groups = groupby(map(lambda s: map(str.strip,s.split(':')), data), len)
dict((next(i[1])[0], list(next(groups)[1])) for i in groups)

of

{'a': [['b', 'c 901'], ['d', 'e sda']],
 'any': [['abc', 'def 12132'],
  ['ghi', 'jkl dasf'],
  ['mno', 'pqr fas'],
  ['stu', 'vwx utu']],
 'v': [['w', 'x ads']]}

Python: regex for data collection

More articles: