How to read encoded string data frame from csv in python

Question

How to read encoded string data frame from csv in python

Suppose I read an html site and I get a list of names, such as: "Amiel, Henri-Frederic."

To get a list of names, I will decrypt html using the following code:

f = urllib.urlopen("http://xxx.htm")
html = f.read()
html=html.decode('utf8')
t.feed(html)
t.close()
lista=t.data

At this point, the lista variable contains a list of names, such as:

[u'Abatantuono, Diego ', ..., u'Amiel, Henri-Frédéric']

Now I would like to:

put these names inside a DataFrame;
save DataFrame in csv file;
read csv in Python via DataFrame

For simplicity, we’ll only consider the name above to complete steps 1 through 3. I would use the following code:

name=u'Amiel, Henri-Fr\xe9d\xe9ric'
name=name.encode('utf8')
array=[name]
df=pd.DataFrame({'Names':array})
df.to_csv('names')
uni=pd.read_csv('names')
uni #trying to read the csv file in a DataFrame

At this moment, I get the following error:

UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 67: invalid continuation byte

If I substitute the last line of code above:

print uni

I can read a DataFrame, but I don't think this is the right way to handle this problem.

, , .

+5

python pandas utf-8

fabrizio_ff 25 . '13 7:56

1

root · Accepted Answer · 2013-03-25T08:02:36+0000

to_csv read_csv encoding. . unicode . , / .

import pandas as pd

name = u'Amiel, Henri-Fr\xe9d\xe9ric'
array = [name]
df = pd.DataFrame({'Names':array})
df.to_csv('names', encoding='utf-8')
uni = pd.read_csv('names', index_col = [0], encoding='utf-8')
print uni  # for me it works with or without print

                   Names
0  Amiel, Henri-Frédéric

How to read encoded string data frame from csv in python

More articles: