r - Lapply in a dataframe over different variables using filters -
i'm trying calculate several new variables in dataframe. take initial values example:
say have:
dataset <- data.frame(time=rep(c(1990:1992),2), geo=c(rep("at",3),rep("de",3)),var1=c(1:6), var2=c(7:12)) time geo var1 var2 1 1990 @ 1 7 2 1991 @ 2 8 3 1992 @ 3 9 4 1990 de 4 10 5 1991 de 5 11 6 1992 de 6 12
and want:
time geo var1 var2 var1_1990 var1_1991 var2_1990 var2_1991 1 1990 @ 1 7 1 2 7 8 2 1991 @ 2 8 1 2 7 8 3 1992 @ 3 9 1 2 7 8 4 1990 de 4 10 4 5 10 11 5 1991 de 5 11 4 5 10 11 6 1992 de 6 12 4 5 10 11
so both time , variable changing new variables. here attempt:
intitialyears <- c(1990,1991) intitialvars <- c("var1", "var2") # ideally, want code have change these 2 vectors # , it's possible change dimensions (i in initialyears){ lapply(initialvars,function(x){ rep(dataset[time==i,x],each=length(unique(dataset$time))) })}
which runs without error yields nothing. assign variable names in example (eg. "var1_1990") , make new variables part of dataframe. avoid loop don't know how wrap 2 lapply's around function. should rather have function use 2 arguments? problem apply function not carry results environment? i've been stuck here while grateful help!
p.s.: have solution combination combination without apply , likes i'm trying away copy , paste:
dataset$var1_1990 <- c(rep(dataset$var1[which(dataset$time==1990)], each=length(unique(dataset$time))))
this can done subset()
, reshape()
, , merge()
:
merge(dataset,reshape(subset(dataset,time%in%c(1990,1991)),dir='w',idvar='geo',sep='_')); ## geo time var1 var2 var1_1990 var2_1990 var1_1991 var2_1991 ## 1 @ 1990 1 7 1 7 2 8 ## 2 @ 1991 2 8 1 7 2 8 ## 3 @ 1992 3 9 1 7 2 8 ## 4 de 1990 4 10 4 10 5 11 ## 5 de 1991 5 11 4 10 5 11 ## 6 de 1992 6 12 4 10 5 11
the column order isn't have in question, can fix after-the-fact index operation, if necessary.
Comments
Post a Comment