Parsing strings using Python

I cannot find any documents for the parse () method for strings. Is there a good recommendation? I want to analyze the following:

frame 0 rows {3 2 3 3 3 3 2 3 2 3 3 3 2 3 2 3 4 3 3 4 3 2 2 3 3 3 2 2 2 2 3 3 3 2 3 2 3 3 3 3 4 3 4 3 3 3 3 4 3 2 3 3 3 3 2 2 2 4 4 3 3 3 3 3 4 4 4 3 2 4 3 4 3 3 3 4 3 3 4 3 3 4 4 3 3 3 4 4 3 4 3 3 3 3 3 4} columns {2 3 2 3 3 3 4 3 3 2 3 2 2 2 3 2 3 3 2 2 2 3 3 3 3 2 3 3 3 2 3 3 2 2 2 3 3 4 3 3 3 3 3 3 3 3 2 3 3 3 3 4 3 2 3 2 3 3 3 3 3 2 2 3 3 3 3 2 3 3 3 3 3 3 3 3 3 4 3 3 3 3 3 4 3 3 4 3 4 4 4 3 4 4 4 4 4 4 3 3 4 4 3 4 4 4 4 3 3 3 4 4 3 4 4 3 3 4 3 5 5 5 5 4 5 4 4 4}

into two int lists.

+3
source share
5 answers

The python strings () parameter will not help you here (it has very obscure usage). In this case, I would do it in an obvious way: with regular expressions! If 's' is your line above,

import re
lists = [
    [int(i) for i in match.split()]
    for match in re.findall(r'{(.*?)}', s)
]

print lists
+5
source
>>> a="frame 0 rows {3 2 3 3 3 3 2 3 2 3 3 3 2 3 2 3 4 3 3 4 3 2 2 3 3 3 2 2 2 2 3 3 3 2 3 2 3 3 3 3 4 3 4 3 3 3 3 4 3 2 3 3 3 3 2 2 2 4 4 3 3 3 3 3 4 4 4 3 2 4 3 4 3 3 3 4 3 3 4 3 3 4 4 3 3 3 4 4 3 4 3 3 3 3 3 4} columns {2 3 2 3 3 3 4 3 3 2 3 2 2 2 3 2 3 3 2 2 2 3 3 3 3 2 3 3 3 2 3 3 2 2 2 3 3 4 3 3 3 3 3 3 3 3 2 3 3 3 3 4 3 2 3 2 3 3 3 3 3 2 2 3 3 3 3 2 3 3 3 3 3 3 3 3 3 4 3 3 3 3 3 4 3 3 4 3 4 4 4 3 4 4 4 4 4 4 3 3 4 4 3 4 4 4 4 3 3 3 4 4 3 4 4 3 3 4 3 5 5 5 5 4 5 4 4 4}"

>>> import ast
>>> import re
>>> for match in re.finditer("\{([\d ]+)\}",a):
    integers=match.groups()[0]
    l=ast.literal_eval(integers.replace(" ",","))
    print l


(3, 2, 3, 3, 3, 3, 2, 3, 2, 3, 3, 3, 2, 3, 2, 3, 4, 3, 3, 4, 3, 2, 2, 3, 3, 3, 2, 2, 2, 2, 3, 3, 3, 2, 3, 2, 3, 3, 3, 3, 4, 3, 4, 3, 3, 3, 3, 4, 3, 2, 3, 3, 3, 3, 2, 2, 2, 4, 4, 3, 3, 3, 3, 3, 4, 4, 4, 3, 2, 4, 3, 4, 3, 3, 3, 4, 3, 3, 4, 3, 3, 4, 4, 3, 3, 3, 4, 4, 3, 4, 3, 3, 3, 3, 3, 4)
(2, 3, 2, 3, 3, 3, 4, 3, 3, 2, 3, 2, 2, 2, 3, 2, 3, 3, 2, 2, 2, 3, 3, 3, 3, 2, 3, 3, 3, 2, 3, 3, 2, 2, 2, 3, 3, 4, 3, 3, 3, 3, 3, 3, 3, 3, 2, 3, 3, 3, 3, 4, 3, 2, 3, 2, 3, 3, 3, 3, 3, 2, 2, 3, 3, 3, 3, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 3, 4, 3, 3, 4, 3, 4, 4, 4, 3, 4, 4, 4, 4, 4, 4, 3, 3, 4, 4, 3, 4, 4, 4, 4, 3, 3, 3, 4, 4, 3, 4, 4, 3, 3, 4, 3, 5, 5, 5, 5, 4, 5, 4, 4, 4)

I have never heard of a parsing method to actually parse a string the way you ask. However, parsing this line is not so difficult. Here's how to do it.

+1
source

pyparsing , , :

from pyparsing import *

s = "frame 0 rows {3 2 3 3 3 3 2 3 2 3 3 3 2 3 2 3 4 3 3 4 3 2 2 3 3 3 2 2 2 2 3 3 3 2 3 2 3 3 3 3 4 3 4 3 3 3 3 4 3 2 3 3 3 3 2 2 2 4 4 3 3 3 3 3 4 4 4 3 2 4 3 4 3 3 3 4 3 3 4 3 3 4 4 3 3 3 4 4 3 4 3 3 3 3 3 4} columns {2 3 2 3 3 3 4 3 3 2 3 2 2 2 3 2 3 3 2 2 2 3 3 3 3 2 3 3 3 2 3 3 2 2 2 3 3 4 3 3 3 3 3 3 3 3 2 3 3 3 3 4 3 2 3 2 3 3 3 3 3 2 2 3 3 3 3 2 3 3 3 3 3 3 3 3 3 4 3 3 3 3 3 4 3 3 4 3 4 4 4 3 4 4 4 4 4 4 3 3 4 4 3 4 4 4 4 3 3 3 4 4 3 4 4 3 3 4 3 5 5 5 5 4 5 4 4 4}"

LBRACE,RBRACE = map(Suppress,"{}")
integer = Word(nums).setParseAction(lambda t:int(t[0]))

line = ("frame" + integer("frame") + 
        "rows" + LBRACE + ZeroOrMore(integer)("rows") + RBRACE + 
        "columns" + LBRACE + ZeroOrMore(integer)("columns") + RBRACE )

data = line.parseString(s)
print data.frame
print data.rows[:10]
print data.columns[:10]

:

0
[3, 2, 3, 3, 3, 3, 2, 3, 2, 3]
[2, 3, 2, 3, 3, 3, 4, 3, 3, 2]
+1
m_string = "frame 0 rows {3 2 3 3 3 3 2 3 2 3 3 3 2 3 2 3 4 3 3 4 3 2 2 3 3 3 2 2 2 2 3 3 3 2 3 2 3 3 3 3 4 3 4 3 3 3 3 4 3 2 3 3 3 3 2 2 2 4 4 3 3 3 3 3 4 4 4 3 2 4 3 4 3 3 3 4 3 3 4 3 3 4 4 3 3 3 4 4 3 4 3 3 3 3 3 4} columns {2 3 2 3 3 3 4 3 3 2 3 2 2 2 3 2 3 3 2 2 2 3 3 3 3 2 3 3 3 2 3 3 2 2 2 3 3 4 3 3 3 3 3 3 3 3 2 3 3 3 3 4 3 2 3 2 3 3 3 3 3 2 2 3 3 3 3 2 3 3 3 3 3 3 3 3 3 4 3 3 3 3 3 4 3 3 4 3 4 4 4 3 4 4 4 4 4 4 3 3 4 4 3 4 4 4 4 3 3 3 4 4 3 4 4 3 3 4 3 5 5 5 5 4 5 4 4 4}"

import re

print [[ int(i) for i in  x.split(" ")] for x in [ match for match in re.findall("\{([\d ]+)\}", m_string)] ]

:

[[3, 2, 3, 3, 3, 3, 2, 3, 2, 3, 3, 3, 2, 3, 2, 3, 4, 3, 3, 4, 3, 2, 2, 3, 3, 3, 2, 2, 2, 2, 3, 3, 3, 2, 3, 2, 3, 3, 3, 3, 4, 3, 4, 3, 3, 3, 3, 4, 3, 2, 3, 3, 3, 3, 2, 2, 2, 4, 4, 3, 3, 3, 3, 3, 4, 4, 4, 3, 2, 4, 3, 4, 3, 3, 3, 4, 3, 3, 4, 3, 3, 4, 4, 3, 3, 3, 4, 4, 3, 4, 3, 3, 3, 3, 3, 4], [2, 3, 2, 3, 3, 3, 4, 3, 3, 2, 3, 2, 2, 2, 3, 2, 3, 3, 2, 2, 2, 3, 3, 3, 3, 2, 3, 3, 3, 2, 3, 3, 2, 2, 2, 3, 3, 4, 3, 3, 3, 3, 3, 3, 3, 3, 2, 3, 3, 3, 3, 4, 3, 2, 3, 2, 3, 3, 3, 3, 3, 2, 2, 3, 3, 3, 3, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 3, 4, 3, 3, 4, 3, 4, 4, 4, 3, 4, 4, 4, 4, 4, 4, 3, 3, 4, 4, 3, 4, 4, 4, 4, 3, 3, 3, 4, 4, 3, 4, 4, 3, 3, 4, 3, 5, 5, 5, 5, 4, 5, 4, 4, 4]]

0

All Articles