Generate data when the number of cells is random, but the sums of the rows are always the same

Question

Generate data when the number of cells is random, but the sums of the rows are always the same

I am in a situation where I need to create a bunch of fake data sets where the sum of the two variables is the same as in my real data, but the calculations for each variable are random. Here's the setting:

>df
    X.1  X.2
1   145   30
2    55   73

The first line is summed up to 175, and the second - up to 128. What I'm looking for is a way to create a data frame (or a collection of data frames) as follows:

>df.2
    X.1  X.2
1   100   75
2    90   38

In df.2, the number of cells has changed, but the rows are still summed into the same table. The actual data contains hundreds of rows, but only two variables if that helps. I tried to figure out how to do this with help sample(), but no luck. Any suggestions?

Thank!

+5

r

user1202761 Aug 20 '12 at 0:24

source share

4

, r2dtable?

> r2dtable(2, c(175,128), c(190, 113))
[[1]]
     [,1] [,2]
[1,]  108   67
[2,]   82   46

[[2]]
     [,1] [,2]
[1,]  114   61
[2,]   76   52

, @mnel, rmultinom n, . , , , rmultinom , , , .

n <- 10
e <- cbind(X1  = c(100,90,30),X2 = c(75,28,120))
aperm(array(sapply(1:nrow(e), function(i) 
        rmultinom(n, rowSums(e)[i], (e/rowSums(e))[i,])),
      dim=c(ncol(e),n,nrow(e))), c(3,1,2))

+6

Aaron 20 . '12 1:59

:

test <- data.frame(X.1=c(145,55),X.2=c(30,73))

sample:

t(sapply(
        rowSums(test),
        function(x) {
                one <- sample(1:x,1)
                two <- (x - one)
                result <- data.frame(one,two)
                names(result) <- names(test)
                return(result)
                }
         )
)

:

     X.1 X.2
[1,] 20  155
[2,] 127 1

...

     X.1 X.2
[1,] 111 64 
[2,] 94  34

....

:

jitter , .

t(apply(
        test,
        1,
        function(x) {
                rsum <- sum(x)
                one <- round(jitter(x[1],20,20),0)
                two <- (rsum - one)
                result <- c(one,two)
                names(result) <- names(test)
                return(result)
                }
    )
)

:

     X.1 X.2
[1,] 160  15
[2,]  47  81

     X.1 X.2
[1,] 127  48
[2,]  64  64

+2

thelatemail 20 . '12 0:59

If you have a total sample size of n = .. say 40, and the number of cells is 4 with the number of columns = say 2, then the call should be:

rmultinom(2, size = 40/4, prob = c(0.5,0.5))
     [,1] [,2]
[1,]    6    3
[2,]    4    7

If you want the function to deliver this result with a certain probability for each row, then:

 my_mat_rand <- function(tot, coln, probs){
     rmultinom(coln, size = tot/length(probs), prob = probs) }

> my_mat_rand(tot=40, coln=2, probs  = c(0.5,0.5))
     [,1] [,2]
[1,]   11   10
[2,]    9   10
> my_mat_rand(40, 2, probs  = c(0.5,0.5))
     [,1] [,2]
[1,]    8   13
[2,]   12    7

If you want the probabilities to be also “random,” use runifto indicate the first and 1- that-valueto indicate the second element of the vector probs.

0

42- Jun 15 '17 at 4:57

source share

mnel · Accepted Answer · 2012-08-20T00:44:07+0000

,

( ), .

EDIT 2

/
pass expected

, rmultinom , , t

replicates <- 10
expected <- data.frame(X1  = c(100,90,30),X2 = c(75,28,120))
##    X1  X2
## 1 100  75
## 2  90  28
## 3  30 120
data_samples <- lapply(seq(replicates), function(i, expected){
   # create a list of expected cell counts (list element = row of expected)
  .list <- lapply(apply(expected,1,list),unlist)
   # sample from these expected cell counts and recombine into a data.frame
   as.data.frame(do.call(rbind,lapply(.list, function(.x) t(rmultinom(n = 1, prob = .x,  size = sum(.x) )))))
   }, expected = expected)

data.frames

data_samples[[1]]
##    X1  X2
## 1 104  71
## 2  84  34
## 3  19 131


data_samples[[5]]
##   X1  X2
## 1 88  87
## 2 92  26
## 3 27 123

Generate data when the number of cells is random, but the sums of the rows are always the same

EDIT 2

More articles: