我有这个示例数据框
df <- data.frame(New=c("X2", "k 5, N 8", "N30","k 6, N 3", "K5", "S12", "K5", "k 1, N 18"),
K_10=c(NA, NA, 3, 4,0,2,NA, NA),
K_11=c(NA, NA, NA, 4,0,3,NA, NA),
K_12=c(NA, 2, NA, NA,0,NA,NA,0),
K_13=c(0, 3, 5, NA,0,5,NA,NA),
K_14=c(NA, 3, 1, 2,10,10,NA,NA),
K_15=c(NA, 2, 3, 5,15,10,NA,2),
K_16=c(NA, 10, 1, 6,43,10,NA,56),
K_17=c(NA, 5, 1, 3,1,10,NA,23),
K_18=c(NA, 6, 4, 2,0,10,NA,12),
K_19=c(NA, 3, 8, NA,3,10,NA,90),
K_20=c(NA, 3, 19, 2,6,10,NA,59),
K_21=c(NA, 3, 10, 2,8,10,NA,11),
K_22=c(NA, 3, NA, 2,9,10,NA,10),
K_23=c(NA, 3, NA, 2,90,10,NA,9))
df
New K_10 K_11 K_12 K_13 K_14 K_15 K_16 K_17 K_18 K_19 K_20 K_21 K_22 K_23
1 X2 NA NA NA 0 NA NA NA NA NA NA NA NA NA NA
2 k 5, N 8 NA NA 2 3 3 2 10 5 6 3 3 3 3 3
3 N30 3 NA NA 5 1 3 1 1 4 8 19 10 NA NA
4 k 6, N 3 4 4 NA NA 2 5 6 3 2 NA 2 2 2 2
5 K5 0 0 0 0 10 15 43 1 0 3 6 8 9 90
6 S12 2 3 NA 5 10 10 10 10 10 10 10 10 10 10
7 K5 NA NA NA NA NA NA NA NA NA NA NA NA NA NA
8 k 1, N 18 NA NA 0 NA NA 2 56 23 12 90 59 11 10 9
我想为上述数据创建一些附加列。
仅当新列获得固定结构“k,N”时,第一列才会检查每行是否至少有 6 个数值,即“k 5,N 8”,“k 6,N 3”和“k 1,N 18”
当满足第一个附加列的标准时,第二列将计算我们有多少个数值。
仅当新列具有固定结构“k,N”时,第三列才会检查每行是否最多有 5 个数值(1,2,3,4,5),即“k 5,N 8”,“k 6,N 3”和“k 1,N 18”。
第四列将计算当满足第三附加列的标准时我们有多少个数值。
所以我想要类似的东西
df_N <- data.frame(New=c("X2", "k 5, N 8", "N30","k 6, N 3", "K5", "S12", "K5", "k 1, N 18"),
K_10=c(NA, NA, 3, 4,0,2,NA, NA),
K_11=c(NA, NA, NA, 4,0,3,NA, NA),
K_12=c(NA, 2, NA, NA,0,NA,NA,0),
K_13=c(0, 3, 5, NA,0,5,NA,NA),
K_14=c(NA, 3, 1, 2,10,10,NA,NA),
K_15=c(NA, 2, 3, 5,15,10,NA,2),
K_16=c(NA, 10, 1, 6,43,10,NA,56),
K_17=c(NA, 5, 1, 3,1,10,NA,23),
K_18=c(NA, 6, 4, 2,0,10,NA,12),
K_19=c(NA, 3, 8, NA,3,10,NA,90),
K_20=c(NA, 3, 19, 2,6,10,NA,59),
K_21=c(NA, 3, 10, 2,8,10,NA,11),
K_22=c(NA, 3, NA, 2,9,10,NA,10),
K_23=c(NA, 3, NA, 2,90,10,NA,9),
At_least_6=c("Not Applicable","TRUE", "Not Applicable","TRUE", "Not Applicable", "Not Applicable", "Not Applicable","TRUE"),
Count_at_least_6=c("Not Applicable",8, "Not Applicable",6, "Not Applicable", "Not Applicable", "Not Applicable",5),
At_most_5=c("Not Applicable","FALSE", "Not Applicable","FALSE", "Not Applicable", "Not Applicable", "Not Applicable","FALSE"),
Count_at_most_5=c(0,0,0,0,0,0,0,0) )
df_N
New K_10 K_11 K_12 K_13 K_14 K_15 K_16 K_17 K_18 K_19 K_20 K_21 K_22 K_23 At_least_6 Count_at_least_6 At_most_5 Count_at_most_5
1 X2 NA NA NA 0 NA NA NA NA NA NA NA NA NA NA Not Applicable Not Applicable Not Applicable 0
2 k 5, N 8 NA NA 2 3 3 2 10 5 6 3 3 3 3 3 TRUE 8 FALSE 0
3 N30 3 NA NA 5 1 3 1 1 4 8 19 10 NA NA Not Applicable Not Applicable Not Applicable 0
4 k 6, N 3 4 4 NA NA 2 5 6 3 2 NA 2 2 2 2 TRUE 6 FALSE 0
5 K5 0 0 0 0 10 15 43 1 0 3 6 8 9 90 Not Applicable Not Applicable Not Applicable 0
6 S12 2 3 NA 5 10 10 10 10 10 10 10 10 10 10 Not Applicable Not Applicable Not Applicable 0
7 K5 NA NA NA NA NA NA NA NA NA NA NA NA NA NA Not Applicable Not Applicable Not Applicable 0
8 k 1, N 18 NA NA 0 NA NA 2 56 23 12 90 59 11 10 9 TRUE 5 FALSE 0
解释:在“Count_as_least”列的第一行中,我们的值为 7,因为从第六个值开始有 7 个值。换句话说,我们有前 6 个值
8 2 3 3 2 10
从第六个值开始,我们有
10 5 6 3 3 3 3 3
共计 8 个。
最后一列为零,因为倒数第二列的标准不满足。
我知道 dplyr 中的 counts 函数可以完成这个工作但我不确定。
我的真实数据包含数千行这样的行。
您可以使用 sapply 找出行是否符合条件,然后使用 rowsums 进行计数。在下面的代码中,我使用 df[, -1],因为我排除了第一列 New,如果您的原始 df 的结构略有不同,您可以调整它。最后,关于您的解释,在列“Count_as_least”的第一行中有 7 个值,因为您添加了示例 df 中没有的初始数字“8”。