Strsplit one column with accurate information in two columns

I have data similar to:

    SNP Geno Allele
marker1   G1    AA
marker2   G1    TT
marker3   G1    TT
marker1   G2    CC
marker2   G2    AA
marker3   G2    TT
marker1   G3    GG
marker2   G3    AA
marker3   G3    TT

And I want it to look like this:

    SNP Geno Allele1 Allele2
marker1   G1       A       A
marker2   G1       T       T
marker3   G1       T       T
marker1   G2       C       C
marker2   G2       A       A
marker3   G2       T       T
marker1   G3       G       G
marker2   G3       A       A
marker3   G3       T       T

I use this:

strsplit(Allele, split extended = TRUE)

But that does not work. Are additional teams needed?

+5
source share
3 answers

Another approach: from start to finish:

Make reproducible data:

dat <- read.table(header = TRUE,  text = "SNP Geno    Allele
marker1 G1  AA
marker2 G1  TT
marker3 G1  TT
marker1 G2  CC
marker2 G2  AA
marker3 G2  TT
marker1 G3  GG
marker2 G3  AA
marker3 G3  TT")

UPDATED Extract the allele column, split it into separate characters, then make these characters into two columns of the data frame:

Explicitly

dat1 <- data.frame(t(matrix(
                     unlist(strsplit(as.vector(dat$Allele), split = "")), 
                     ncol = length(dat$Allele), nrow = 2)))

OR after @joran's suggestion

dat1 <- data.frame(do.call(rbind, strsplit(as.vector(dat$Allele), split = "")))

THEN

Add column names to new columns:

names(dat1) <- c("Allele1", "Allele2")

Attach two new columns to the columns from the original data table, as @ user1317221 suggests:

dat3 <- cbind(dat$SNP, dat$Geno, dat1)
        dat$SNP dat$Geno Allele1 Allele2
1 marker1       G1       A       A
2 marker2       G1       T       T
3 marker3       G1       T       T
4 marker1       G2       C       C
5 marker2       G2       A       A
6 marker3       G2       T       T
7 marker1       G3       G       G
8 marker2       G3       A       A
9 marker3       G3       T       T
+11
source

Try:

Allele<-dat$Allele    
Allele1<-substr(Allele, start = 1, stop = 1)
Allele2<-substr(Allele, start = 2, stop = 2)

you can combine them together or, nevertheless, want to put them in a data frame

EDIT:

@Ben , Ben

Allele1 <- with(dat, substr(Allele, start = 1, stop = 1))

+4

read.fwf. read.table co., read.fwf text, textConnection:

# dat$Allele <- as.character(dat$Allele) # Necessary if it a factor
cbind(dat[-3], 
      read.fwf(textConnection(dat$Allele), 
               widths = c(1, 1), col.names=c("Allele1", "Allele2")))
#       SNP Geno Allele1 Allele2
# 1 marker1   G1       A       A
# 2 marker2   G1       T       T
# 3 marker3   G1       T       T
# 4 marker1   G2       C       C
# 5 marker2   G2       A       A
# 6 marker3   G2       T       T
# 7 marker1   G3       G       G
# 8 marker2   G3       A       A
# 9 marker3   G3       T       T

, ( dat.

transform(dat, Allele1 = substr(Allele, 1, 1), 
          Allele2 = substr(Allele, 2, 2))[-3]

:

      SNP Geno Allele1 Allele2
1 marker1   G1       A       A
2 marker2   G1       T       T
3 marker3   G1       T       T
4 marker1   G2       C       C
5 marker2   G2       A       A
6 marker3   G2       T       T
7 marker1   G3       G       G
8 marker2   G3       A       A
9 marker3   G3       T       T

, , transform.


( )

cSplit splitstackshape stripWhite = FALSE.

, "", :

library(splitstackshape)
cSplit(dat, "Allele", "", stripWhite = FALSE)
#        SNP Geno Allele_1 Allele_2
# 1: marker1   G1        A        A
# 2: marker2   G1        T        T
# 3: marker3   G1        T        T
# 4: marker1   G2        C        C
# 5: marker2   G2        A        A
# 6: marker3   G2        T        T
# 7: marker1   G3        G        G
# 8: marker2   G3        A        A
# 9: marker3   G3        T        T

: R

+2

All Articles