How to read encoded string data frame from csv in python

Suppose I read an html site and I get a list of names, such as: "Amiel, Henri-Frederic."

To get a list of names, I will decrypt html using the following code:

f = urllib.urlopen("http://xxx.htm")
html = f.read()
html=html.decode('utf8')
t.feed(html)
t.close()
lista=t.data

At this point, the lista variable contains a list of names, such as:

[u'Abatantuono, Diego ', ..., u'Amiel, Henri-Frédéric']

Now I would like to:

  • put these names inside a DataFrame;
  • save DataFrame in csv file;
  • read csv in Python via DataFrame

For simplicity, we’ll only consider the name above to complete steps 1 through 3. I would use the following code:

name=u'Amiel, Henri-Fr\xe9d\xe9ric'
name=name.encode('utf8')
array=[name]
df=pd.DataFrame({'Names':array})
df.to_csv('names')
uni=pd.read_csv('names')
uni #trying to read the csv file in a DataFrame

At this moment, I get the following error:

UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 67: invalid continuation byte      

If I substitute the last line of code above:

print uni

I can read a DataFrame, but I don't think this is the right way to handle this problem.

, , .

+5
1

to_csv read_csv encoding. . unicode . , / .

import pandas as pd

name = u'Amiel, Henri-Fr\xe9d\xe9ric'
array = [name]
df = pd.DataFrame({'Names':array})
df.to_csv('names', encoding='utf-8')
uni = pd.read_csv('names', index_col = [0], encoding='utf-8')
print uni  # for me it works with or without print

                   Names
0  Amiel, Henri-Frédéric
+9

All Articles