XML for pandas dataframe

Question

XML for pandas dataframe

I have an XML file with thousands of lines, for example:

<Word x1="206" y1="120" x2="214" y2="144" font="Times-Roman" style="font-size:22pt">WORD</Word>

I want to convert it (all attributes) to pandas dataframe. To do this, I could scroll through the file with a beautiful soup and insert values line by line or create lists that need to be inserted as columns. However, I would like to know if there is a more pythonic way to accomplish what I described. Thank you in advance.

Code example:

x1list=[]
x2list=[]

for word in soup.page.findAll('word'):
    x1list.append(int(word['x1']))
    x2list.append(int(word['x2']))
df=DataFrame({'x1':x1list,'x2':x2list})

+5

python xml pandas dataframe

root Jun 08 '12 at 11:28

source share

1 answer

eumiro · Accepted Answer · 2012-06-08T12:09:32+0000

Try the following:

DataFrame.from_records([(int(word['x1']), int(word['x2']))
                        for word in soup.page.findAll('word')],
                       columns=('x1', 'x2'))

XML for pandas dataframe

More articles: