Divide one column into two columns in R with a loop

Question

Divide one column into two columns in R with a loop

Actually I have the same problem with this case strsplit one column with exact information in two columns

This issue has already been resolved, only my data looks like

      SNP Geno AlleleA AlleleB AlleleC AlleleD AlleleE
1 marker1   G1      AA      AA      AA      AA      AA
2 marker2   G1      TT      TT      TT      TT      TT
3 marker3   G1      TT      TT      TT      TT      TT
4 marker1   G2      CC      CC      CC      CC      CC
5 marker2   G2      AA      AA      AA      AA      AA
6 marker3   G2      TT      TT      TT      TT      TT
7 marker1   G3      GG      GG      GG      GG      GG
8 marker2   G3      AA      AA      AA      AA      AA
9 marker3   G3      TT      TT      TT      TT      TT

conclusion:

structure(list(SNP = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L, 3L), .Label = c("marker1", "marker2", "marker3"), class = "factor"), 
    Geno = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), .Label = c("G1", 
    "G2", "G3"), class = "factor"), AlleleA = structure(c(1L, 
    4L, 4L, 2L, 1L, 4L, 3L, 1L, 4L), .Label = c("AA", "CC", "GG", 
    "TT"), class = "factor"), AlleleB = structure(c(1L, 4L, 4L, 
    2L, 1L, 4L, 3L, 1L, 4L), class = "factor", .Label = c("AA", 
    "CC", "GG", "TT")), AlleleC = structure(c(1L, 4L, 4L, 2L, 
    1L, 4L, 3L, 1L, 4L), class = "factor", .Label = c("AA", "CC", 
    "GG", "TT")), AlleleD = structure(c(1L, 4L, 4L, 2L, 1L, 4L, 
    3L, 1L, 4L), class = "factor", .Label = c("AA", "CC", "GG", 
    "TT")), AlleleE = structure(c(1L, 4L, 4L, 2L, 1L, 4L, 3L, 
    1L, 4L), class = "factor", .Label = c("AA", "CC", "GG", "TT"
    ))), .Names = c("SNP", "Geno", "AlleleA", "AlleleB", "AlleleC", 
"AlleleD", "AlleleE"), row.names = c(NA, -9L), class = "data.frame")

In this question, he only has one column that he wants to split into two columns. The problem is that I have 5000 columns (AlleleA, AlleleB ......... etc.) that want to split (each column up to two columns)

I tried to use such a loop, but it doesn’t work,

for(i in colnames(dat)){
  dat1 <- data.frame(do.call(rbind, strsplit(as.vector(sprintf("dat$%s",i)), split = "")))
}

I will wait for your light, thanks

+2

split r

user46543 Dec 05 '14 at 9:27

source share

3 answers

:

library(qdap)
res <- colsplit2df(dat, splitcols=2:ncol(dat),sep='')
colnames(res)[-1] <- make.names(rep(colnames(dat)[-1],each=2), unique=TRUE)
res[1:3,1:5]
#      SNP Geno Geno.1 AlleleA AlleleA.1
#1 marker1    G      1       A         A
#2 marker2    G      1       T         T
#3 marker3    G      1       T         T

Allele

colsplit2df(dat, splitcols=grep('Allele', names(dat)),sep='')

(Tyler Rinker)

data.frame setNames :

setNames(dat, gsub("([A-Z]{1}[a-z]+[A-Z])", "\\1.1&\\1.2", names(dat))) %>%
    colsplit2df(splitcols=3:ncol(dat), sep='')

+3

akrun 05 . '14 9:42

As @beginneR says, you can use tidyr::separate. Here is an example taken from: http://blog.rstudio.org/2014/07/22/introducing-tidyr/

head(tidier, 8)

#>   id       trt     key    time
#> 1  1 treatment work.T1 0.08514
#> 2  2   control work.T1 0.22544
#> 3  3 treatment work.T1 0.27453
#> 4  4   control work.T1 0.27231
#> 5  1 treatment home.T1 0.61583
#> 6  2   control home.T1 0.42967
#> 7  3 treatment home.T1 0.65166
#> 8  4   control home.T1 0.56774

tidy <- tidier %>%
  separate(key, into = c("location", "time"), sep = "\\.") 
tidy %>% head(8)
#>   id       trt location time    time
#> 1  1 treatment     work   T1 0.08514
#> 2  2   control     work   T1 0.22544
#> 3  3 treatment     work   T1 0.27453
#> 4  4   control     work   T1 0.27231
#> 5  1 treatment     home   T1 0.61583
#> 6  2   control     home   T1 0.42967
#> 7  3 treatment     home   T1 0.65166
#> 8  4   control     home   T1 0.56774

+2

Davide passaretti Dec 05 '14 at 9:38

source share

A5C1D2H2I1M1N2O1R2T1 · Accepted Answer · 2014-12-05T09:36:20+0000

You can use cSplitfrom my package splitstackshape with an argument stripWhite = FALSE.

, " *", :

library(splitstackshape)
cSplit(mydf, grep("Allele", names(mydf)), "", stripWhite = FALSE)
#        SNP Geno AlleleA_1 AlleleA_2 AlleleB_1 AlleleB_2 AlleleC_1
# 1: marker1   G1         A         A         A         A         A
# 2: marker2   G1         T         T         T         T         T
# 3: marker3   G1         T         T         T         T         T
# 4: marker1   G2         C         C         C         C         C
# 5: marker2   G2         A         A         A         A         A
# 6: marker3   G2         T         T         T         T         T
# 7: marker1   G3         G         G         G         G         G
# 8: marker2   G3         A         A         A         A         A
# 9: marker3   G3         T         T         T         T         T
#    AlleleC_2 AlleleD_1 AlleleD_2 AlleleE_1 AlleleE_2
# 1:         A         A         A         A         A
# 2:         T         T         T         T         T
# 3:         T         T         T         T         T
# 4:         C         C         C         C         C
# 5:         A         A         A         A         A
# 6:         T         T         T         T         T
# 7:         G         G         G         G         G
# 8:         A         A         A         A         A
# 9:         T         T         T         T         T

Divide one column into two columns in R with a loop

More articles: