Adding a new column based on the column heading

I work with R to convert several million rows of data into a single column like this.

fixedStep chrom=chr7 start=10239 step=1
0.064
0.064
0.064
0.055
0.055
0.089
0.076
fixedStep chrom=chr7 start=10262 step=1
0.076
0.076
0.089
0.076
0.076
0.076
0.076
0.089
0.089
0.076
0.089
0.076
0.089
0.089
fixedStep chrom=chr7 start=10398 step=1
0.076
0.089
0.089
0.089
0.089
0.076

to that......

10239 0.064
10240 0.064
10241 0.064
10242 0.055
10243 0.055
10244 0.089
10245 0.076
10262 0.076
10263 0.076
10264 0.089
10265 0.076
10266 0.076
10267 0.076
10268 0.076
10269 0.089
10270 0.089
10271 0.076
10272 0.089
10273 0.076
10274 0.089
10275 0.089
10398 0.076
10399 0.089
10400 0.089
10401 0.089
10402 0.089
10403 0.076

ie, I want to add a new column of numbers (before or after the data, in the above example, before the data). The numbers in the new column begin with start=valueand increase by 1 ( step=1) until a new column heading ( fixedStep chrom=chr7 start=10262 step=1) is reached . When this happens, the numbers begin with start=new valueand again increase by 1 ( step=1) to the new column heading .. and so on and so forth.

Since this is a large file, I cannot upload it to the R workspace. It would be nice to combine it with UNIX / linux tools to perform this operation.

+3
source share
2

unix...

#!/usr/bin/awk -f
/^fixedStep/ {
  i=int(substr($0,match($0,"start=")+6))
  d=int(substr($0,match($0,"step=")+5))
}
!/^f/ { print i, $0; i+=d }

: , "fixedStep", "start =", 6 ( "start =" ), , , i ( awk- , "12345 step = 1" 12345, , ). "step =".

, "f", i d i.

+4

readLines . , . - .

, ( ), . , - , R. awk, , .

ReadFile <- function(file){

  DF <- data.frame(ID=numeric(0),value=numeric(0))

   while(1){

    z <- readLines(file,1)
    if(length(z)==0 | z=="") {break}

    Start <- if(grepl("start",z))
      as.numeric(gsub(".+start=(\\d+).+","\\1",z))

    if(is.null(Start)){
        DF <- rbind(DF,
            data.frame(ID=ID,value=as.numeric(z))
        )
        ID <- ID + 1
    } else {
      ID <- Start
    }
  }
  return(DF)
}

:

ZZ <- textConnection("fixedStep chrom=chr7 start=10239 step=1
0.064
0.076
fixedStep chrom=chr7 start=10262 step=1
0.076
0.089
fixedStep chrom=chr7 start=10398 step=1
0.045
0.089
")

> ReadFile(ZZ)
     ID value
1 10239 0.064
2 10240 0.076
3 10262 0.076
4 10263 0.089
5 10398 0.045
6 10399 0.089
+2

All Articles