It certainly should be a trivial task with awkone way or another, but it left me scratching my head this morning. I have a file with a format similar to this:
pep> AEYTCVAETK 2 genes ADUm.1024,ADUm.5198,ADUm.750
pep> AIQLTGK 1 genes ADUm.1999,ADUm.3560
pep> AIQLTGK 8 genes ADUm.1999,ADUm.3560
pep> KHEPPTEVDIEGR 5 genes ADUm.367
pep> VSSILEDKTT 9 genes ADUm.1192,ADUm.2731
pep> AIQLTGK 10 genes ADUm.1999,ADUm.3560
pep> VSSILEDKILSR 3 genes ADUm.2146,ADUm.5750
pep> VSSILEDKILSR 2 genes ADUm.2146,ADUm.5750
I would like to print a line for each individual peptide value in column 2, which means that the above input will be as follows:
pep> AEYTCVAETK 2 genes ADUm.1024,ADUm.5198,ADUm.750
pep> AIQLTGK 1 genes ADUm.1999,ADUm.3560
pep> KHEPPTEVDIEGR 5 genes ADUm.367
pep> VSSILEDKTT 9 genes ADUm.1192,ADUm.2731
pep> VSSILEDKILSR 3 genes ADUm.2146,ADUm.5750
This is what I have tried so far, but obviously does not do what I need:
awk '{print $2}' file | sort | uniq
awk '{print $0, "\t", $1}' file |sort | uniq -u -f 4
The last thing he will need to treat is peptides, which are substrings of other peptides as separate values (for example, VSSILED and VSSILEDKILSR). Thank:)
source
share