我想创建一个表来统计数据集中每列非 NA 值的数量。我正在使用summarize_all()
,但我很难获得非 NA 值的数量。
我参考了这个 StackOverflow 线程来获得一些见解,但它似乎没有帮助我获得计数:summarize_all with "n()" function。如果我通过,我将获得不同值的计数,或者如果我通过或如线程所引用,summarize_all(n_distinct)
我将获得行数。summarize_all(list(n=~n()))
summarize_all(list(n="length")
我的期望输出:
ID Female Male Non_Binary
5 5 4 3
我哪里做错了?
# Sample Code
test<-as_tibble(data.frame(`ID` = c("1","2","3","4","5"),
`Female` = c("Female","Female","Female","Female","Female"),
`Male` = c(NA,"Male","Male","Male","Male"),
`Non_Binary`=c("Non-Binary","Non-Binary","Non-Binary",NA,NA)))
## Attempt 1
summary<-test%>%
summarize_all(list(n=~n()))
# A tibble: 1 × 4
ID_n Female_n Male_n Non_Binary_n
<int> <int> <int> <int>
1 5 5 5 5
## Attempt 2
summary<-test%>%
summarize_all(list(n="length"))
# A tibble: 1 × 4
ID_n Female_n Male_n Non_Binary_n
<int> <int> <int> <int>
1 5 5 5 5
## Attempt 3
summary<-test%>%
summarize_all(n_distinct)
# A tibble: 1 × 4
ID Female Male Non_Binary
<int> <int> <int> <int>
1 5 1 2 2
### Desired Output
ID Female Male Non_Binary
5 5 4 3
n()
和length()
是这种情况下糟糕的选择 - 它们不会忽略NA
它们所计算的值。计算满足条件(例如“非 NA”)的值的经典方法是满足sum
条件。summarize_all()
也已经弃用了几年。现在更倾向于使用across()
。