regex - Subset all 3 digit numbers and collapse them with a separator in a data frame. R -
i'm formating data set each entry has adegenet format codominant markers, such as:
loci1 ###/### 208/210 200/204 198/208
where # represents digit (the number allele size in basepairs). data has homozygous entries (all 3 digit integers no separator) have the form of:
loci1 ### 208 198
i intend paste
3 digit string sep='/'
produce first format. i've tried use grep subset these homozygous entries finding non ###/###
, negating match using table matching such as:
a <- grep('\\b\\d{3}?[/]\\d{3}', score$loci1, value =t ) # subset ###/###/ score[!(a %in% 1:nrow(score$loci1)), ] # works on vectors...
after subset paste
. problem arises when apply data frame. grep
seems treat data frame list (which in part is) , returns columns have match.
so in short how can go ###
###/###
in data frame
self contained example of data:
score2 <- null set.seed(9) loci1 <- null loci2 <- null loci3 <- null (i in 1:5) loci1 <- append(loci1, paste(sample(seq(from = 230, to=330, by=3), 2, replace = f), collapse = '/')) (i in 1:5) loci2 <- append(loci2, paste(sample(seq(from = 230, to=330, by=3), 2, replace = f), collapse = '/')) (i in 1:5) loci3 <- append(loci3, paste(sample(seq(from = 230, to=330, by=3), 2, replace = f), collapse = '/')) score2 <- data.frame(loci1, loci2, loci3, stringsasfactors = f) score2[2,3] <- strsplit(score2[2,3], split = '/')[1] score2[5,2] <- strsplit(score2[3,3], split = '/')[1] score2[1,1] <- strsplit(score2[1,1], split = '/')[1] score2[c(1, 4),c(2,3)] <- na score2
you replace 3 digit items separator , copy:
sub("^(...)$", "\\1/\\1", loci1)
use lapply
anonymized function:
data.frame( lapply(score2, function(x) sub("^(...)$", "\\1/\\1", x) ) ) loci1 loci2 loci3 1 251/251 <na> <na> 2 251/329 320/257 260/260 3 275/242 278/329 281/320 4 269/266 <na> <na> 5 296/326 281/281 326/314
(not sure "paste-part" supposed refer to, think intent of question)
if numeric values have varying number of digits use pattern argument "^([0-9]{1,9})$"
Comments
Post a Comment