selection - R: Choosing specific number of combinations from all possible combinations -
let's have following dataset
set.seed(144) dat <- matrix(rnorm(100), ncol=5)
the following function creates possible combinations of columns , removes first
(cols <- do.call(expand.grid, rep(list(c(f, t)), ncol(dat)))[-1,]) # var1 var2 var3 var4 var5 # 2 true false false false false # 3 false true false false false # 4 true true false false false # ... # 31 false true true true true # 32 true true true true true
my question how can calculate single, binary , triple combinations ?
choosing rows including no more 3 true values using following function works vector: cols[rowsums(cols)<4l, ]
however, gives following error larger vectors because of error in expand.grid long vectors:
error in rep.int(seq_len(nx), rep.int(rep.fac, nx)) : invalid 'times' value in addition: warning message: in rep.fac * nx : nas produced integer overflow
any suggestion allow me compute single, binary , triple combinations ?
you can use solution:
col.i <- do.call(c,lapply(1:3,combn,x=5,simplify=f)) # [[1]] # [1] 1 # # [[2]] # [1] 2 # # <...skipped...> # # [[24]] # [1] 2 4 5 # # [[25]] # [1] 3 4 5
here, col.i
list every element of contains column indices.
how works: combn
generates combinations of numbers 1 5 (requested x
=5) taken m
@ time (simplify=false
ensures result has list structure). lapply
invokes implicit cycle iterate m
1 3 , returns list of lists. do.call(c,...)
converts list of lists plain list.
you can use col.i
columns dat
using e.g. dat[,col.i[[1]],drop=f]
(1 index of column combination, use number 1 25; drop=f
makes sure when pick 1 column dat
, result not simplified vector, might cause unexpected program behavior). option use lapply
, e.g.
lapply(col.i, function(cols) dat[,cols])
which return list of data frames each containing subset of columns of dat
.
in case want column indices boolean matrix, can use:
col.b <- t(sapply(col.i,function(z) 1:5 %in% z)) # [,1] [,2] [,3] [,4] [,5] # [1,] true false false false false # [2,] false true false false false # [3,] false false true false false # ...
[update]
more efficient realization:
library("grbase") coli <- function(x=5,m=3) { col.i <- do.call(c,lapply(1:m,combnprim,x=x,simplify=f)) z <- lapply(seq_along(col.i), function(i) x*(i-1)+col.i[[i]]) v.b <- rep(f,x*length(col.i)) v.b[unlist(z)] <- true matrix(v.b,ncol=x,byrow = true) } coli(70,5) # takes 30 sec on desktop
Comments
Post a Comment