Split CSV file and exclude output column using bash, sed or awk

Question

Split CSV file and exclude output column using bash, sed or awk

I have a CSV file that contains the following data: -

1,275,,,275,17.3,0,"2011-05-09 20:21:45"
2,279,,,279,17.3,0,"2011-05-10 20:21:52"
3,276,,,276,17.3,0,"2011-05-11 20:21:58"
4,272,,,272,17.3,0,"2011-05-12 20:22:04"
5,272,,,272,17.3,0,"2011-05-13 20:22:10"
6,278,,,278,17.3,0,"2011-05-13 20:24:08"
7,270,,,270,17.3,0,"2011-05-13 20:24:14"
8,269,,,269,17.3,0,"2011-05-14 20:24:20"
9,278,,,278,17.3,0,"2011-05-14 20:24:26"

This file contains 4432986 data lines.

I want to split a file based on the new file name in the date of the last column.

Therefore, based on the above data, I would like to get 6 new files with lines for each day in each file.

I need files named in the format YYYY_MM_DD.

I would also like to ignore the first column in the output

Thus, the file 2011_05_13 will contain the following lines, with the first column excluded: -

272,,,272,17.3,0,"2011-05-13 20:22:10"
278,,,278,17.3,0,"2011-05-13 20:24:08"
270,,,270,17.3,0,"2011-05-13 20:24:14"

I plan to do this in the linux box, so anything with any linux utilities would be cool, sed awk, etc.

+3

linux bash awk sed csv

general exception Apr 18 '12 at 20:43

source share

6

awk:

awk -F, 'BEGIN{OFS=",";} {dt=$8; gsub(/^"| .*"$/,"", dt);
$1=""; sub(/^,/, "", $0); print $0 > dt}' input.txt

+2

anubhava 18 . '12 22:11

(perl/python), , , bash, .

 cat bigfile.txt | while read LINE;
  do echo $LINE >> `echo $LINE | cut -d, -f8 | cut -c2-11`.txt ;
 done

, , , while, .

cut. cut ( 8) (-d,), cut , ", 11.

Now, to solve the problem of removing the first column:

cat bigfile.txt | sed 's/^.*?,//'

This regex just removes everything up to the first comma.

So, we will replace the beginning of our while loop with this, leaving us with:

 cat bigfile.txt | sed 's/^.*?,//' | while read LINE;
  do echo $LINE >> `echo $LINE | cut -d, -f8 | cut -c2-11`.txt ;
 done

+1

Donald miner Apr 18 '12 at 20:57

source share

This monster captures all unique dates, and then greps for these keys in the source file, storing them in files called this key. Yes, the useless use of a cat, but an attempt to spray action.

cat records.txt \
| cut -f8 -d, \
| cut -f1 -d ' ' \
| tr -d '"' \
| sort -u \
| while read DATE ; do \
    cat records.txt \
    | cut -f2- -d, \
    | egrep ",\"${DATE} [0-9]{2}:[0-9]{2}:[0-9]{2}\"" \
    > ${DATE}.txt
done

+1

Demosthenex Apr 18 '12 at 20:58

source share

It should be simple.

$ sed 's/^[0-9]*,//' your_gigantic_data.csv

0

allenhwkim Apr 18 '12 at 21:12

source share

This might work for you:

sed 's/^[^,]*,\(.*"\(....\)-\(..\)-\(..\).*\)/echo \1 >>\2_\3_\4.csv/' file | sh

or GNU sed:

sed 's/^[^,]*,\(.*"\(....\)-\(..\)-\(..\).*\)/echo \1 >>\2_\3_\4.csv/e' file

0

potong Apr 18 '12 at 10:36

source share

Steve · Accepted Answer · 2012-04-18T22:07:06+0000

awk:

awk -F "," '{ split ($8,array," "); sub ("\"","",array[1]); sub (NR,"",$0); sub (",","",$0); print $0 > array[1] }' file.txt

, , , . .

EDIT:

:

-F ","
, .
split ($8,array," ")
array.
sub ("\"","",array[1])
( , ) " ( ", \ ).
sub (NR,"",$0)
(NR - , $0 - , , ).
sub (",","",$0)
.
, , $0 array[1]: print $0 > array[1].

:

, hypon, , , array[1]. : gsub ("-","_",array[1]).

:

awk -F "," '{ split ($8,array," "); sub ("\"","",array[1]); gsub ("-","_",array[1]); sub (NR,"",$0); sub (",","",$0); print $0 > array[1] }' file.txt

.

Split CSV file and exclude output column using bash, sed or awk

More articles: