Convert excel or csv file to pandas layered framework

I have been provided with a reasonably large Excel file (5k lines), as well as a CSV, which I would like to make in a multi-level DataFame pandas. The file is structured as follows:

SampleID    OtherInfo    Measurements    Error    Notes
sample1     stuff                                 more stuff
                         36              6
                         26              7
                         37              8
sample2     newstuff                              lots of stuff
                         25              6
                         27              7

where the number of measurements is variable (and sometimes zero). Between any information there is no complete empty row, and the "Measurement" and "Error" columns are empty in rows that have other (string) data; this may complicate the analysis (?). Is there an easy way to automate this conversion? My initial idea is to first parse the file using Python and then pass the material to the DataFrame slots in a loop, but I don’t know exactly how to implement it, or if this is even the best way to proceed.

Thanks in advance!

+5
2

, , read_fwf().

In [145]: data = """\
SampleID    OtherInfo    Measurements    Error    Notes                   
sample1     stuff                                 more stuff              
                         36              6
                         26              7
                         37              8
sample2     newstuff                              lots of stuff           
                         25              6
                         27              7
"""

In [146]: df = pandas.read_fwf(StringIO(data), widths=[12, 13, 14, 9, 15])

, , , , set_index() MultiLevel.

In [147]: df[['Measurements', 'Error']] = df[['Measurements', 'Error']].shift(-1)

In [148]: df[['SampleID', 'OtherInfo', 'Notes']] = df[['SampleID', 'OtherInfo', 'Notes']].fillna()

In [150]: df = df.dropna()

In [151]: df
Out[151]:
  SampleID OtherInfo  Measurements  Error          Notes
0  sample1     stuff            36      6     more stuff
1  sample1     stuff            26      7     more stuff
2  sample1     stuff            37      8     more stuff
4  sample2  newstuff            25      6  lots of stuff
5  sample2  newstuff            27      7  lots of stuff
+4

, , .

import csv
reader = csv.Reader(open(<csv_file_name>)
data = []
keys = reader.next()
for row in reader():
    r = dict(zip(keys,row))
    if not r['measurements'] or not r['Error']:
        continue
    for key in ['SampleID', 'OtherInfo', 'Notes']:
        if not r[key]:
            index = -1
            while True:
                if data[index][key]:
                    r[key] = data[index][key]
                    break
                index -= 1
    data.append(r)
0

All Articles