将复制活动的序列号添加到 Blob

Question

Wael

Asked: 2024-11-07 20:52:28 +0800 CST2024-11-07 20:52:28 +0800 CST 2024-11-07 20:52:28 +0800 CST

在 dplyr 管道内使用 which.min 时遇到问题

772

which.min我在使用dplyr 管道内部函数时遇到了一些麻烦solution (*)，我正在寻找一种更紧凑、更优雅的方式来实现这一点

可重现的例子

library(dplyr)

data=data.frame(s1=c(10,NA,5,NA,NA),s2=c(8,NA,NA,4,20),s3=c(NA,NA,2,NA,10))
data
#>   s1 s2 s3
#> 1 10  8 NA
#> 2 NA NA NA
#> 3  5 NA  2
#> 4 NA  4 NA
#> 5 NA 20 10

最小值：

在这里min(x,na.rm=TRUE)我可以提取最小值

data%>%
  rowwise()%>%
  mutate(Min_s=min(c(s1,s2,s3),na.rm=TRUE))
#> Warning: There was 1 warning in `mutate()`.
#> ℹ In argument: `Min_s = min(c(s1, s2, s3), na.rm = TRUE)`.
#> ℹ In row 2.
#> Caused by warning in `min()`:
#> ! no non-missing arguments to min; returning Inf
#> # A tibble: 5 × 4
#> # Rowwise: 
#>      s1    s2    s3 Min_s
#>   <dbl> <dbl> <dbl> <dbl>
#> 1    10     8    NA     8
#> 2    NA    NA    NA   Inf
#> 3     5    NA     2     2
#> 4    NA     4    NA     4
#> 5    NA    20    10    10

提取包含最小值的变量：

在这里，我无法提取哪个变量包含最小值

data%>%
  rowwise()%>%
  mutate(which_s=which.min(c(s1,s2,s3)))
#> Error in `mutate()`:
#> ℹ In argument: `which_s = which.min(c(s1, s2, s3))`.
#> ℹ In row 2.
#> Caused by error:
#> ! `which_s` must be size 1, not 0.
#> ℹ Did you mean: `which_s = list(which.min(c(s1, s2, s3)))` ?

# Solution (*)
data%>%
  rowwise()%>%
  mutate(which_s=if(!is.na(s1)|!is.na(s2)|!is.na(s3)) {which.min(c(s1,s2,s3))} else NA )
#> # A tibble: 5 × 4
#> # Rowwise: 
#>      s1    s2    s3 which_s
#>   <dbl> <dbl> <dbl>   <int>
#> 1    10     8    NA       2
#> 2    NA    NA    NA      NA
#> 3     5    NA     2       3
#> 4    NA     4    NA       2
#> 5    NA    20    10       3

^{创建于 2024-11-07，使用reprex v2.1.0}

3 个回答

Voted

ThomasIsCoding · Answer 1 · 2024-11-07T21:23:12+08:00

Best Answer

ThomasIsCoding

2024-11-07T21:23:12+08:002024-11-07T21:23:12+08:00

在第二行中，您将integer(0)在列中获得which_s，这就是您无法无错误运行它的要点。

相反，您可以先将结果存储在列表中，然后unnest（不要忘记启用keep_empty参数unnest）

data %>%
    rowwise() %>%
    mutate(which_s = list(which.min(c(s1, s2, s3)))) %>%
    unnest(which_s, keep_empty = TRUE)

由此得出

# A tibble: 5 × 4
     s1    s2    s3 which_s
  <dbl> <dbl> <dbl>   <int>
1    10     8    NA       2
2    NA    NA    NA      NA
3     5    NA     2       3
4    NA     4    NA       2
5    NA    20    10       3

7

jpsmith · Answer 2 · 2024-11-07T21:27:01+08:00

如果不使用rowwise()，您可以在基础 R 中或使用单个mutate()步骤执行此操作purrr::pmap_chr()：

基数R：

data$min_base <- unlist(apply(data, 1, \(x) ifelse(all(is.na(x)), NA, names(data)[which.min(x)])))

dplyr/purrr

library(dplyr)

data <- data %>%
  mutate(min_dplyr = purrr::pmap_chr(select(., s1:s3), \(...) {
    ifelse(all(is.na(c(...))), NA, colnames(data)[which.min(c(...))])
  }))

输出：

#   s1 s2 s3 min_base min_dplyr
# 1 10  8 NA       s2        s2
# 2 NA NA NA     <NA>      <NA>
# 3  5 NA  2       s3        s3
# 4 NA  4 NA       s2        s2
# 5 NA 20 10       s3        s3

请注意，在这些答案中，@friede 的基本 R 自定义函数速度明显更快，其次是这个基本 R 方法：

bigdata <- data[rep(seq_len(nrow(data)), 1e5),]

microbenchmark::microbenchmark(
  rowwise = bigdata %>%
    rowwise() %>%
    mutate(which_s = list(which.min(c(s1, s2, s3)))) %>%
    tidyr::unnest(which_s, keep_empty = TRUE),
  base = unlist(apply(bigdata, 1, \(x) ifelse(all(is.na(x)), NA, names(bigdata)[which.min(x)]))),
  pmap = bigdata %>%
    mutate(min_dplyr = purrr::pmap_chr(select(., s1:s3), \(...) {
      ifelse(all(is.na(c(...))), NA, colnames(bigdata)[which.min(c(...))])
    })),
  custom_row.which.min = row.which.min(bigdata, names = TRUE, ties="first")
)

#                 expr       min       lq      mean    median        uq       max neval cld
#              rowwise 3730.8131 4512.870 6018.3180 4985.6024 5913.5166 53501.838   100 a  
#                 base 2419.1913 3162.745 4309.7700 3557.7805 4427.4588 32814.209   100  b 
#                 pmap 3837.8870 4593.846 6091.5265 5203.0391 5984.0412 22015.418   100 a  
# custom_row.which.min  108.4075  147.695  221.7602  168.5267  240.6043  1419.106   100   c

score 2 · Answer 3 · 2024-11-07T23:32:47+08:00

2024-11-07T23:32:47+08:002024-11-07T23:32:47+08:00

我有时会错过一个好row.which.min功能。这个功能远非好用，而且无法与{dplyr}-language 很好地协调工作，但在这里可能会有所帮助。

v0

row.which.min = \(.data, .cols, .names = FALSE, tm = "first") {
  if(missing(.cols)) .cols = names(.data)
  x = .data[.cols]
  i = rowSums(is.na(x)) < length(.cols)
  nx = -x[i, ]
  nx[is.na(nx)] = -Inf
  y = rep(NA, nrow(.data))
  y[i] = max.col(nx, tm)
  if(!.names) y else names(.data)[y]
}

给予

> df0 = data.frame(s1=c(10,NA,5,NA,NA),s2=c(8,NA,NA,4,20),s3=c(NA,NA,2,NA,10))
> row.which.min(df0, .names = TRUE)
[1] "s2" NA   "s3" "s2" "s3"

2

在 dplyr 管道内使用 which.min 时遇到问题

`(表达式，左值) = 右值` 在 C 或 C++ 中是有效的赋值吗？为什么有些编译器会接受/拒绝它？

何时应使用 std::inplace_vector 而不是 std::vector？

在 C++ 中，一个不执行任何操作的空程序需要 204KB 的堆，但在 C 中则不需要

如果 T 既不可构造、不可复制、也不可移动，那么我可以拥有 std::optional<T> 吗？

为什么我可以定义一个 constinit 的 std::string 实例？如果对象需要动态初始化，constinit 不是被禁止的吗？

如何分配以后放置的新“如同新”

PowerBI 目前与 BigQuery 不兼容：Simba 驱动程序与 Windows 更新有关

将 NULL 和 nullptr 传递给模板参数有什么区别？

AdMob：MobileAds.initialize() - 对于某些设备，“java.lang.Integer 无法转换为 java.lang.String”

我正在尝试仅使用海龟随机和数学模块来制作吃豆人游戏

在 dplyr 管道内使用 which.min 时遇到问题

3 个回答

相关问题