Reading csv file fragments in Python using pandas

I have a question regarding reading bits and pieces of a csv file. When you just read the file using

pd.read_csv(path,sep=';',na_values=[''],thousands='.',decimal=',',date_parser=[0])

I get:

     EUR     1Y     2Y     3Y
0  2013-09-25  0,198  0,307  0,485
1  2013-09-26  0,204  0,318  0,497
2  2013-09-27  0,204  0,306  0,487
3  2013-09-28  0,204  0,306  0,487
4         USD     1Y     2Y     3Y
5  2013-09-25  0,462  0,571  0,749
6  2013-09-26  0,468  0,582  0,761
7  2013-09-27  0,468   0,57  0,751
8  2013-09-28  0,468   0,57  0,751

As you can see, the data is ordered by date, and each data set is in pieces one after another (in this case, the USD data comes immediately after the EUR data). A currency label is gaining a few things, and the data becomes a single data frame.

I would like to have two separate data frames since

     EUR     1Y     2Y     3Y
0  2013-09-25  0,198  0,307  0,485
1  2013-09-26  0,204  0,318  0,497
2  2013-09-27  0,204  0,306  0,487
3  2013-09-28  0,204  0,306  0,487

     USD     1Y     2Y     3Y
0  2013-09-25  0,462  0,571  0,749
1  2013-09-26  0,468  0,582  0,761
2  2013-09-27  0,468   0,57  0,751
3  2013-09-28  0,468   0,57  0,751

That is, I would like to separate each set of currency data from each other.

Any suggestions?

+3
source share
2 answers

Here is an alternative approach to the problem. It reads csv into one DataFrame and then uses the data processing bit to create a currency column:

           currency     1Y     2Y     3Y
date                                    
2013-09-25      EUR  0,198  0,307  0,485
2013-09-26      EUR  0,204  0,318  0,497
2013-09-27      EUR  0,204  0,306  0,487
2013-09-28      EUR  0,204  0,306  0,487
2013-09-25      USD  0,462  0,571  0,749
2013-09-26      USD  0,468  0,582  0,761
2013-09-27      USD  0,468   0,57  0,751
2013-09-28      USD  0,468   0,57  0,751

"" DataFrame DataFrames , groupby:

groups = df.groupby(['currency'])
for key, grp in groups:
    print(grp)

import numpy as np
import pandas as pd

df = pd.read_table('data',sep=';',na_values=[''],thousands='.',decimal=',',
                   names=['date', '1Y', '2Y', '3Y'])
mask = df['date'].str.contains('^\s*\D')              # 1
df['currency'] = (df['date']
                  .where(mask, np.nan)                # 2
                  .fillna(method='ffill'))            # 3
df = df.loc[~mask]                                    # 4

print(df)    

groups = df.groupby(['currency'])
for key, grp in groups:
    print(grp)

  • str.contains, df['date'], . , . mask True .

    In [120]: mask
    Out[120]: 
    0     True
    1    False
    2    False
    3    False
    4    False
    5     True
    6    False
    7    False
    8    False
    9    False
    Name: date, dtype: bool
    
  • df['date'].where(mask, np.nan) , df['date'], True np.nan .
  • - nans

    In [123]: df['date'].where(mask, np.nan).fillna(method='ffill')
    Out[123]: 
    0    EUR
    1    EUR
    2    EUR
    3    EUR
    4    EUR
    5    USD
    6    USD
    7    USD
    8    USD
    9    USD
    Name: date, dtype: object
    
  • , False, .
+3

nrows skiprows read_csv

, 4 :

eur = pd.read_csv(path,sep=';',na_values=[''],thousands='.',decimal=',',date_parser=[0], nrows=4)

5 :

usd = pd.read_csv(path,sep=';',na_values=[''],thousands='.',decimal=',',date_parser=[0], skiprows=5)

+1
source

All Articles