Awk to select values from intervals defined by pairs of columns

Question

Awk to select values from intervals defined by pairs of columns

I am trying to create an awk command to select rows with a column of values 2 that is in the range of values determined by the union of the individual columns of the row. It has application in the challenge of single nucleotide polymorphisms that are not within 50 nucleotides of exon boundaries. The file is as follows:

ID  X   start   end start   end start   end start   end  
Fal1825_c6  802 2   62  62  239 239 362 362 934  
Fal1821_c2  152 1   19  22  159 159 263 264 398  
Fal18279_c7 41  1   177 177 598                 
Fal18376_c3 367 1   251 251 421                 
Fal18748_c2 601 1   152 152 489 489 499 499 677  
Fal18748_c2 500 1   152 152 489 489 499 499 677  
Fal18792_c3 750 1   234 234 459 459 762 762 83  
Fal19487_c2 89  1   177 177 270 270 409 411 459

I only want to print lines in which the value of the second column falls into the range ("start" + 50) and ("end" - 50), for any "start" and "end" pairing on this line (pairs from the "start" columns only "and the" end "next to each other), that is, between ($ 3 + 50 and $ 4-50) or ($ 5 + 50 and $ 6-50) or ($ 7 + 50 and $ 8-50), and so on further, considering all pairs of initial columns for the component.

The result will look like this:

ID  X   start   end start   end start   end start   end  
Fal1825_c6  802 2   62  62  239 239 362 362 934  
Fal18376_c3 367 1   251 251 421             
Fal18748_c2 601 1   152 152 489 489 499 499 677  
Fal19487_c2 89  1   177 177 270 270 409 411 459

My attempt was like this

awk '{a=3; b=4; while ($a > 0) do {if ($2 > ($a + 50) && $2 < ($b + 50)){print $0} else {a+2, b+2} }'

thank

+3

awk

Cris Apr 20 '12 at 13:48

source share

1 answer

yazu · Accepted Answer · 2012-04-20T13:57:43+0000

Try:

awk '{
for (i = 3; i <= NF; i += 2)
  if ($2 > $i+50 && $2 < $(i+1)-50) { print; next } 
}' FILE

Awk to select values ​​from intervals defined by pairs of columns

More articles:

Awk to select values from intervals defined by pairs of columns