我有一些数据
set.seed(1)
n <- 100
df <- data.frame(
x = sample(1:30, n, replace = T),
y = sample(1:30, n, replace = T),
z = sample(1:30, n, replace = T)
)
和带有表达式的向量,它们可能不同。
rules <- c("df$x[i] < df$y[i-2] - df$x[i]",
"df$y[i] >= mean(df$x)",
"df$y[i] == 20",
"df$z[i-30] >= df$x[5]",
"df$y[i-5] == 16",
"df$x[10] > sd(as.matrix(df[(i-5):i,]))")
接下来,我有一个函数可以顺序搜索第一个表达式的触发,然后是第二个表达式,依此类推
seq_rules <- function(df, rules, show=T){
ln <- length(rules)
res <- matrix(0,nrow = ln, ncol = 2, dimnames = list(NULL, c("row","res")))
n <- 1
for(i in 30:nrow(df)){
if(eval(str2expression(rules[n]))){
res[n,"row"] <- i
res[n,"res"] <- 1
if(show) print( cbind.data.frame(df[i,], rule=rules[n], row=i))
n <- n+1
}
if(n>ln) break
}
res
}
我想加快我的代码速度。您将如何编写这段代码以使其尽可能快?我也希望您的解决方案在不同的情况下与我的解决方案相同seeds
=========================================
如果规则表示为已评估的函数
Frules <- lapply(rules,\(x) eval(str2expression(paste("function(i) {", x ,"}"))))
然后由于没有eval(str2expression..))
循环,我可以获得一点速度
新功能
Fseq_rules <- function(df, rules){
ln <- length(rules)
res <- matrix(0,nrow = ln, ncol = 2, dimnames = list(NULL, c("row","res")))
n <- 1
for(i in 30:nrow(df)){
if(rules[[n]](i)){
res[n,"row"] <- i
res[n,"res"] <- 1
n <- n+1
}
if(n>ln) break
}
res
}
microbenchmark::microbenchmark(Fseq_rules(df, Frules),
seq_rules(df, rules,show = F),times = 100)
Unit: milliseconds
expr min lq mean median uq max neval
Fseq_rules(df, Frules) 1.083315 1.118951 1.283135 1.156011 1.247808 5.601309 100
seq_rules(df, rules, show = F) 2.495045 2.545790 2.779712 2.607938 2.861662 6.243315 100
比原来的快不了多少:
df
如果用矩阵替换,你会获得很多速度。并相应地更改规则: