This file is in the following format.
GGRPW,33332211,kr,P,SUCCESS,systemrenewal,REN,RAMS,SAA,0080527763,on:X,10.0,N,20120419,migr
GBRPW,1232221,uw,P,SUCCESS,systemrenewal,REN,RAMS,ASD,20075578623,on:X,1.0,N,20120419,migr
GLSH,21122111,uw,P,SUCCESS,systemrenewal,REN,RAMS,ASA,0264993503,on:X,10.0,N,20120419,migr
I need to take out duplicates and count (each duplicate is classified by f1,2,5,14). Then insert the first records of duplicate records of all integer fields into the database and mark the counter (dups) in another column. To do this, I need to cut out all 4 mentioned fields and sort them and find duplicates using uniq -d and for the counters I used -c. Now we come back again after all sorting from duplexes, and he believes that the result should be in the form below.
3,GLSH,21122111,uw,P,SUCCESS,systemrenewal,REN,RAMS,ASA,0264993503,on:X,10.0,N,20120419,migr
While three are the number of repeated duplicates for f1,2,5,14, and the remaining fields can be from any of the dup strings.
Thus, duplicates must be removed from the source file and shown in the above format. And those remaining in the source file will be uniq, they go as they are ...
What I've done...
awk '{printf("%5d,%s\n", NR,$0)}' renewstatus_2012-04-19.txt > n_renewstatus_2012-04-19.txt
cut -d',' -f2,3,6,15 n_renewstatus_2012-04-19.txt |sort | uniq -d -c
but to get lines for duplets, you need to go back to the original file again ...
Let me not be confused .. for this I need a different point of view .. and my brain clings to my approach .. I need a cigar .. Any thots ... ??