将复制活动的序列号添加到 Blob

Question

Alireza Sadeghi

Asked: 2024-08-03 21:20:33 +0800 CST2024-08-03 21:20:33 +0800 CST 2024-08-03 21:20:33 +0800 CST

如何使用函数和“across()”将条件列转变为“tibble”？

772

为了演示目的，我使用了一个tidytuesday名为的数据集animal_outcomes。

我的问题：我在 a 中有几个数字列tibble。我想要mutate一个新列，它将所有列（最后一列除外）相加，如果总和等于最后一列，则新列为 1 ，否则为 0。我将进一步解释：

# Adding the example dataset
data <- tidytuesdayR::tt_load(x = "2020-07-21")
data <- data$animal_outcomes

现在数据是这样的：

> data$animal_outcomes

# A tibble: 664 × 12
    year animal_type outcome      ACT   NSW    NT   QLD    SA   TAS   VIC    WA Total
   <dbl> <chr>       <chr>      <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1  1999 Dogs        Reclaimed    610  3140   205  1392  2329   516  7130     1 15323
 2  1999 Dogs        Rehomed     1245  7525   526  5489  1105   480  4908   137 21415
 3  1999 Dogs        Other         12   745   955   860   380   168  1001     6  4127
 4  1999 Dogs        Euthanized   360  9221     9  9214  1701   599  5217    18 26339
 5  1999 Cats        Reclaimed    111   201    22   206   157    31   884     0  1612
 6  1999 Cats        Rehomed     1442  3913   269  3901  1055   752  3768    62 15162
 7  1999 Cats        Other          0   447     0   386    46   124  1501     5  2509
 8  1999 Cats        Euthanized  1007  8205   847 10554  3415  1056  6113     5 31202
 9  1999 Horses      Reclaimed      0     0     1     0     2     1    87     0    91
10  1999 Horses      Rehomed        1    12     3     3    10     0    19     0    48
# ℹ 654 more rows
# ℹ Use `print(n = ...)` to see more rows

我想添加一个列来检查该Total列是否确实是所有列的总和。这是我脑海中的结果：

# A tibble: 664 × 13
    year animal_type outcome      ACT   NSW    NT   QLD    SA   TAS   VIC    WA Total condition # notice this last column
   <dbl> <chr>       <chr>      <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>     <dbl>
 1  1999 Dogs        Reclaimed    610  3140   205  1392  2329   516  7130     1 15323         1
 2  1999 Dogs        Rehomed     1245  7525   526  5489  1105   480  4908   137 21415         1
 3  1999 Dogs        Other         12   745   955   860   380   168  1001     6  4127         1
 4  1999 Dogs        Euthanized   360  9221     9  9214  1701   599  5217    18 26339         1
 5  1999 Cats        Reclaimed    111   201    22   206   157    31   884     0  1612         1
 6  1999 Cats        Rehomed     1442  3913   269  3901  1055   752  3768    62 15162         1
 7  1999 Cats        Other          0   447     0   386    46   124  1501     5  2509         1
 8  1999 Cats        Euthanized  1007  8205   847 10554  3415  1056  6113     5 31202         1
 9  1999 Horses      Reclaimed      0     0     1     0     2     1    87     0    91         1
10  1999 Horses      Rehomed        1    12     3     3    10     0    19     0    48         1
# ℹ 654 more rows
# ℹ Use `print(n = ...)` to see more rows

我尝试了以下代码。它可以工作，但需要大量击键，因此，如果您有很多列，它将无法很好地工作：

> data$animal_outcomes %>% 
    mutate(condition = if_else((ACT + NSW + NT + QLD + SA + TAS + VIC + WA) == Total, 1, 0))

# A tibble: 664 × 13
    year animal_type outcome      ACT   NSW    NT   QLD    SA   TAS   VIC    WA Total condition
   <dbl> <chr>       <chr>      <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>     <dbl>
 1  1999 Dogs        Reclaimed    610  3140   205  1392  2329   516  7130     1 15323         1
 2  1999 Dogs        Rehomed     1245  7525   526  5489  1105   480  4908   137 21415         1
 3  1999 Dogs        Other         12   745   955   860   380   168  1001     6  4127         1
 4  1999 Dogs        Euthanized   360  9221     9  9214  1701   599  5217    18 26339         1
 5  1999 Cats        Reclaimed    111   201    22   206   157    31   884     0  1612         1
 6  1999 Cats        Rehomed     1442  3913   269  3901  1055   752  3768    62 15162         1
 7  1999 Cats        Other          0   447     0   386    46   124  1501     5  2509         1
 8  1999 Cats        Euthanized  1007  8205   847 10554  3415  1056  6113     5 31202         1
 9  1999 Horses      Reclaimed      0     0     1     0     2     1    87     0    91         1
10  1999 Horses      Rehomed        1    12     3     3    10     0    19     0    48         1
# ℹ 654 more rows
# ℹ Use `print(n = ...)` to see more rows

我也使用了以下但它返回了错误：

data$animal_outcomes %>% 
    mutate(condition = if_else((ACT + NSW + NT + QLD + SA + TAS + VIC + WA) == Total, 1, 0))

另外，这个（显然是错误的，因为它对实际数字进行了总结4:11）：

data$animal_outcomes %>% 
    mutate(condition = if_else(sum(4:11) == Total, 1,0))

还有这个：我不确定为什么sum(ACT:WA)不返回错误！如果没有返回错误，它实际上是在求和什么！！

data$animal_outcomes %>% 
    mutate(condition = if_else(sum(ACT:WA) == Total, 1,0))

# A tibble: 664 × 13
    year animal_type outcome      ACT   NSW    NT   QLD    SA   TAS   VIC    WA Total condition
   <dbl> <chr>       <chr>      <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>     <dbl>
 1  1999 Dogs        Reclaimed    610  3140   205  1392  2329   516  7130     1 15323         0
 2  1999 Dogs        Rehomed     1245  7525   526  5489  1105   480  4908   137 21415         0
 3  1999 Dogs        Other         12   745   955   860   380   168  1001     6  4127         0
 4  1999 Dogs        Euthanized   360  9221     9  9214  1701   599  5217    18 26339         0
 5  1999 Cats        Reclaimed    111   201    22   206   157    31   884     0  1612         0
 6  1999 Cats        Rehomed     1442  3913   269  3901  1055   752  3768    62 15162         0
 7  1999 Cats        Other          0   447     0   386    46   124  1501     5  2509         0
 8  1999 Cats        Euthanized  1007  8205   847 10554  3415  1056  6113     5 31202         0
 9  1999 Horses      Reclaimed      0     0     1     0     2     1    87     0    91         0
10  1999 Horses      Rehomed        1    12     3     3    10     0    19     0    48         0

3 个回答

Voted

langtang · Answer 1 · 2024-08-03T21:30:09+08:00

您可以尝试这个：

data %>% rowwise() %>% mutate(check = 1*(Total ==  sum(c_across(ACT:WA))))

输出：

    year animal_type outcome      ACT   NSW    NT   QLD    SA   TAS   VIC    WA Total check
   <dbl> <chr>       <chr>      <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1  1999 Dogs        Reclaimed    610  3140   205  1392  2329   516  7130     1 15323     1
 2  1999 Dogs        Rehomed     1245  7525   526  5489  1105   480  4908   137 21415     1
 3  1999 Dogs        Other         12   745   955   860   380   168  1001     6  4127     1
 4  1999 Dogs        Euthanized   360  9221     9  9214  1701   599  5217    18 26339     1
 5  1999 Cats        Reclaimed    111   201    22   206   157    31   884     0  1612     1
 6  1999 Cats        Rehomed     1442  3913   269  3901  1055   752  3768    62 15162     1
 7  1999 Cats        Other          0   447     0   386    46   124  1501     5  2509     1
 8  1999 Cats        Euthanized  1007  8205   847 10554  3415  1056  6113     5 31202     1
 9  1999 Horses      Reclaimed      0     0     1     0     2     1    87     0    91     1
10  1999 Horses      Rehomed        1    12     3     3    10     0    19     0    48     1
# ℹ 654 more rows
# ℹ Use `print(n = ...)` to see more rows

margusl · Answer 2 · 2024-08-03T23:36:08+08:00

在您的最后一个例子中，如果您用包装该列范围以pick()将其变成一个框架并用替换sum()，rowSums()它就会起作用：

library(dplyr)
mutate(data, condition = if_else(rowSums(pick(ACT:WA)) == Total, 1, 0))

这可能是示例数据选择不当，但这里的问题在于用测试双精度数的相等性==。为了避免R Inferno 的第 1 圈 - 陷入浮点陷阱，您可能希望使用类似这样的方法：

mutate(data, condition = +near(rowSums(pick(ACT:WA)), Total))

dplyr::near()是一种更安全的选择，因为它在比较输入向量时使用内置容差，+将布尔向量转换为数字（TRUE到1）

结果：

mutate(data, condition = +near(rowSums(pick(ACT:WA)), Total))
#> # A tibble: 10 × 13
#>     year animal_type outcome      ACT   NSW    NT   QLD    SA   TAS   VIC    WA Total condition
#>    <dbl> <chr>       <chr>      <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>     <int>
#>  1  1999 Dogs        Reclaimed    610  3140   205  1392  2329   516  7130     1 15323         1
#>  2  1999 Dogs        Rehomed     1245  7525   526  5489  1105   480  4908   137 21415         1
#>  3  1999 Dogs        Other         12   745   955   860   380   168  1001     6  4127         1
#>  4  1999 Dogs        Euthanized   360  9221     9  9214  1701   599  5217    18 26339         1
#>  5  1999 Cats        Reclaimed    111   201    22   206   157    31   884     0  1612         1
#>  6  1999 Cats        Rehomed     1442  3913   269  3901  1055   752  3768    62 15162         1
#>  7  1999 Cats        Other          0   447     0   386    46   124  1501     5  2509         1
#>  8  1999 Cats        Euthanized  1007  8205   847 10554  3415  1056  6113     5 31202         1
#>  9  1999 Horses      Reclaimed      0     0     1     0     2     1    87     0    91         1
#> 10  1999 Horses      Rehomed        1    12     3     3    10     0    19     0    48         1

示例数据：

data <- structure(list(year = c(1999, 1999, 1999, 1999, 1999, 1999, 1999, 
1999, 1999, 1999), animal_type = c("Dogs", "Dogs", "Dogs", "Dogs", 
"Cats", "Cats", "Cats", "Cats", "Horses", "Horses"), outcome = c("Reclaimed", 
"Rehomed", "Other", "Euthanized", "Reclaimed", "Rehomed", "Other", 
"Euthanized", "Reclaimed", "Rehomed"), ACT = c(610, 1245, 12, 
360, 111, 1442, 0, 1007, 0, 1), NSW = c(3140, 7525, 745, 9221, 
201, 3913, 447, 8205, 0, 12), NT = c(205, 526, 955, 9, 22, 269, 
0, 847, 1, 3), QLD = c(1392, 5489, 860, 9214, 206, 3901, 386, 
10554, 0, 3), SA = c(2329, 1105, 380, 1701, 157, 1055, 46, 3415, 
2, 10), TAS = c(516, 480, 168, 599, 31, 752, 124, 1056, 1, 0), 
    VIC = c(7130, 4908, 1001, 5217, 884, 3768, 1501, 6113, 87, 
    19), WA = c(1, 137, 6, 18, 0, 62, 5, 5, 0, 0), Total = c(15323, 
    21415, 4127, 26339, 1612, 15162, 2509, 31202, 91, 48)), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))

bench::mark()对于完整数据集 (664 × 12 tibble) 进行forrowSums()和sum()in 运算： rowwise()

library(dplyr)
data <- tidytuesdayR::tt_load(x = "2020-07-21")$animal_outcomes 
bm <- bench::mark(
  rowsums_ = mutate(data, check = +near(rowSums(pick(ACT:WA)), Total)),
  rowwise_ = rowwise(data) |>  mutate(check = if_else(near(Total, sum(pick(4:11))), 1L,0L)) |> ungroup(),
  min_iterations = 100,
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
bm
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 rowsums_     1.17ms   1.36ms    646.      2.27MB     7.98
#> 2 rowwise_   106.83ms 114.95ms      8.49    1.34MB    12.1
ggplot2::autoplot(bm)
#> Loading required namespace: tidyr

^{创建于 2024-08-03，使用reprex v2.1.0}

Alireza Sadeghi · Answer 3 · 2024-08-04T00:07:44+08:00

Alireza Sadeghi

2024-08-04T00:07:44+08:002024-08-04T00:07:44+08:00

感谢大家提供的深刻解决方案。

我发现受先前答案启发的以下代码最适合我的问题：

data$animal_outcomes %>% 
    rowwise() %>%  
    mutate(check = if_else(near(Total, sum(pick(4:11))), 1,0))

1

如何使用函数和“across()”将条件列转变为“tibble”？

Vue 3：创建时出错“预期标识符但发现‘导入’”[重复]

为什么这个简单而小的 Java 代码在所有 Graal JVM 上的运行速度都快 30 倍，但在任何 Oracle JVM 上却不行？

具有指定基础类型但没有枚举器的“枚举类”的用途是什么？

如何修复未手动导入的模块的 MODULE_NOT_FOUND 错误？

`(表达式，左值) = 右值` 在 C 或 C++ 中是有效的赋值吗？为什么有些编译器会接受/拒绝它？

何时应使用 std::inplace_vector 而不是 std::vector？

在 C++ 中，一个不执行任何操作的空程序需要 204KB 的堆，但在 C 中则不需要

PowerBI 目前与 BigQuery 不兼容：Simba 驱动程序与 Windows 更新有关

AdMob：MobileAds.initialize() - 对于某些设备，“java.lang.Integer 无法转换为 java.lang.String”

我正在尝试仅使用海龟随机和数学模块来制作吃豆人游戏

如何使用函数和“across()”将条件列转变为“tibble”？

3 个回答

相关问题