Reading csv file fragments in Python using pandas

Question

Reading csv file fragments in Python using pandas

I have a question regarding reading bits and pieces of a csv file. When you just read the file using

pd.read_csv(path,sep=';',na_values=[''],thousands='.',decimal=',',date_parser=[0])

I get:

     EUR     1Y     2Y     3Y
0  2013-09-25  0,198  0,307  0,485
1  2013-09-26  0,204  0,318  0,497
2  2013-09-27  0,204  0,306  0,487
3  2013-09-28  0,204  0,306  0,487
4         USD     1Y     2Y     3Y
5  2013-09-25  0,462  0,571  0,749
6  2013-09-26  0,468  0,582  0,761
7  2013-09-27  0,468   0,57  0,751
8  2013-09-28  0,468   0,57  0,751

As you can see, the data is ordered by date, and each data set is in pieces one after another (in this case, the USD data comes immediately after the EUR data). A currency label is gaining a few things, and the data becomes a single data frame.

I would like to have two separate data frames since

     EUR     1Y     2Y     3Y
0  2013-09-25  0,198  0,307  0,485
1  2013-09-26  0,204  0,318  0,497
2  2013-09-27  0,204  0,306  0,487
3  2013-09-28  0,204  0,306  0,487

     USD     1Y     2Y     3Y
0  2013-09-25  0,462  0,571  0,749
1  2013-09-26  0,468  0,582  0,761
2  2013-09-27  0,468   0,57  0,751
3  2013-09-28  0,468   0,57  0,751

That is, I would like to separate each set of currency data from each other.

Any suggestions?

+3

python pandas csv

gussilago Feb 13 '14 at 12:12

source share

2 answers

nrows skiprows read_csv

, 4 :

eur = pd.read_csv(path,sep=';',na_values=[''],thousands='.',decimal=',',date_parser=[0], nrows=4)

5 :

usd = pd.read_csv(path,sep=';',na_values=[''],thousands='.',decimal=',',date_parser=[0], skiprows=5)

+1

Edchum Feb 13 '14 at 12:17

source share

unutbu · Accepted Answer · 2014-02-13T14:45:42+0000

Here is an alternative approach to the problem. It reads csv into one DataFrame and then uses the data processing bit to create a currency column:

           currency     1Y     2Y     3Y
date                                    
2013-09-25      EUR  0,198  0,307  0,485
2013-09-26      EUR  0,204  0,318  0,497
2013-09-27      EUR  0,204  0,306  0,487
2013-09-28      EUR  0,204  0,306  0,487
2013-09-25      USD  0,462  0,571  0,749
2013-09-26      USD  0,468  0,582  0,761
2013-09-27      USD  0,468   0,57  0,751
2013-09-28      USD  0,468   0,57  0,751

"" DataFrame DataFrames , groupby:

groups = df.groupby(['currency'])
for key, grp in groups:
    print(grp)

import numpy as np
import pandas as pd

df = pd.read_table('data',sep=';',na_values=[''],thousands='.',decimal=',',
                   names=['date', '1Y', '2Y', '3Y'])
mask = df['date'].str.contains('^\s*\D')              # 1
df['currency'] = (df['date']
                  .where(mask, np.nan)                # 2
                  .fillna(method='ffill'))            # 3
df = df.loc[~mask]                                    # 4

print(df)    

groups = df.groupby(['currency'])
for key, grp in groups:
    print(grp)

str.contains, df['date'], . , . mask True .

In [120]: mask
Out[120]: 
0     True
1    False
2    False
3    False
4    False
5     True
6    False
7    False
8    False
9    False
Name: date, dtype: bool

df['date'].where(mask, np.nan) , df['date'], True np.nan .

- nans

In [123]: df['date'].where(mask, np.nan).fillna(method='ffill')
Out[123]: 
0    EUR
1    EUR
2    EUR
3    EUR
4    EUR
5    USD
6    USD
7    USD
8    USD
9    USD
Name: date, dtype: object

, False, .

Reading csv file fragments in Python using pandas

More articles: