将复制活动的序列号添加到 Blob

Question

Mary Rachel

Asked: 2024-12-20 07:46:13 +0800 CST2024-12-20 07:46:13 +0800 CST 2024-12-20 07:46:13 +0800 CST

改变以连接名称中包含特定字符串的列

772

我正在尝试创建一个新列，该列将名称中包含特定字符串的特定列集的所有值用分号分隔符连接起来。我正在工作dplyr，所以我正在寻找tidyverse解决方案。

我尝试grepl()结合使用mutate()、case_when()和paste()来识别名称中包含我想要的字符串的列 ( "Games")，并将它们的内容连接在一起形成一个新列。当失败时，我尝试使用str_detect，但没有成功。

据我所知，我的问题是我无法正确指示代码评估所有列名，然后返回包含我指定模式的字符串的列名。我尝试使用contains("Games")、colnames(.x)以及这些参数的其他变体。我知道如果我明确命名了我想要粘贴在一起的每一列，我就可以做到这一点，但我更喜欢相对解决方案，这样我就不用输入多个名称了。

谢谢你！

# Sample Data

test<-as_tibble(data.frame(`ID` = c("1","2","3"),
                           `Gender` = c("Female","Male","Non-Binary"),
                           `Games_Chess`=c("Chess",NA,"Chess"),
                           `Games_Clue`=c("Clue",NA,NA),
                           `Games_Scrabble`=c("Scrabble",NA,"Scrabble")))
# A tibble: 3 × 5
  ID    Gender     Games_Chess Games_Clue Games_Scrabble
  <chr> <chr>      <chr>       <chr>      <chr>         
1 1     Female     Chess       Clue       Scrabble      
2 2     Male       NA          NA         NA            
3 3     Non-Binary Chess       NA         Scrabble  

# Desired Output

ID    Gender     Games_Chess Games_Clue Games_Scrabble Games    
1     Female     Chess       Clue       Scrabble       Chess; Clue; Scrabble
2     Male       NA          NA         NA             NA
3     Non-Binary Chess       NA         Scrabble       Chess; Scrabble  

# Attempted Code 1

test<-test%>%
  mutate(`Games` = case_when(str_detect(colnames(test),"Games") ~ paste(.x, collapse = ";"), TRUE ~ NA))

# Error Code 1

Error in `mutate()`:
ℹ In argument: `Games = case_when(...)`.
Caused by error in `case_when()`:
! Failed to evaluate the right-hand side of formula 1.
Caused by error:
! object '.x' not found

# Attempted Code 2
test<-test%>%
  mutate(`Games` = case_when(grepl("Games",.) ~ paste(., collapse = ";"), TRUE ~ NA))

# Error Code 2

Error in `mutate()`:
ℹ In argument: `Games = case_when(...)`.
Caused by error:
! `Games` must be size 3 or 1, not 4.
Run `rlang::last_trace()` to see where the error occurred.

4 个回答

Voted

cristian-vargas · Answer 1 · 2024-12-20T08:24:27+08:00

Best Answer

cristian-vargas

2024-12-20T08:24:27+08:002024-12-20T08:24:27+08:00

尽管并非完全dplyr基于，但以下解决方案仍然属于tidyverse使用tidyr：

test %>%
  tidyr::unite(
    # Name of new column
    col = "Games",
    # Select columns to unite using tidy-select syntax
    dplyr::starts_with("Games"),
    # Specify semi-colon as separator
    sep = "; ",
    # Keep original Games_* columns
    remove = FALSE,
    # Remove NA's prior to concatenation
    na.rm = TRUE
  )

输出结果如下：

#>   ID     Gender                 Games Games_Chess Games_Clue Games_Scrabble
#> 1  1     Female Chess; Clue; Scrabble       Chess       Clue       Scrabble
#> 2  2       Male                              <NA>       <NA>           <NA>
#> 3  3 Non-Binary       Chess; Scrabble       Chess       <NA>       Scrabble

编辑：（可选）如果您希望新列位于数据的最末尾，那么您可以在末尾添加以下代码，使用%>%管道运算符将新列移动到末尾：

dplyr::relocate(
    Games,
    .after = dplyr::everything()
  )

4

knitz3 · Answer 2 · 2024-12-20T16:02:22+08:00

rowwise()通常被认为很慢，但除非你正在处理大量数据，否则使用它完全没问题

library(dplyr)

test |>
  rowwise() |>
  mutate(Games = paste(na.omit(c_across(c(starts_with("Games")))), collapse = "; ")) |>
  ungroup() |>
  mutate(Games = sub("^$", NA, Games))

#> # A tibble: 3 × 6
#>   ID    Gender     Games_Chess Games_Clue Games_Scrabble Games                
#>   <chr> <chr>      <chr>       <chr>      <chr>          <chr>                
#> 1 1     Female     Chess       Clue       Scrabble       Chess; Clue; Scrabble
#> 2 2     Male       <NA>        <NA>       <NA>           <NA>                 
#> 3 3     Non-Binary Chess       <NA>       Scrabble       Chess; Scrabble

您还可以单独定义一个复杂的函数以提高可读性。starts_with()很好，但我通常会使用matches()正则表达式。

paste_func <- function(vec) {
  na.omit(vec) |>
    paste(collapse = "; ") |>
    sub("^$", NA, x = _)
}

test |>
  rowwise() |>
  mutate(Games = paste_func(c_across(matches("^Games")))) |>
  ungroup()

#> # A tibble: 3 × 6
#>   ID    Gender     Games_Chess Games_Clue Games_Scrabble Games                
#>   <chr> <chr>      <chr>       <chr>      <chr>          <chr>                
#> 1 1     Female     Chess       Clue       Scrabble       Chess; Clue; Scrabble
#> 2 2     Male       <NA>        <NA>       <NA>           <NA>                 
#> 3 3     Non-Binary Chess       <NA>       Scrabble       Chess; Scrabble

对于较大的数据集，rowwise()可能会有点慢，您可以将数据框的一部分强制转换为矩阵，apply()并相应地分配结果。由于apply()采用矩阵，因此您只需要提供所有同一类的数据。

result <- test
cols_to_paste <- colnames(result)[grepl("^Games", colnames(result))]
result$Games <- apply(result[, cols_to_paste], 1, FUN = paste_func)

result

#> # A tibble: 3 × 6
#>   ID    Gender     Games_Chess Games_Clue Games_Scrabble Games                
#>   <chr> <chr>      <chr>       <chr>      <chr>          <chr>                
#> 1 1     Female     Chess       Clue       Scrabble       Chess; Clue; Scrabble
#> 2 2     Male       <NA>        <NA>       <NA>           <NA>                 
#> 3 3     Non-Binary Chess       <NA>       Scrabble       Chess; Scrabble

我通常发现这比它本身更麻烦，但你也可以pivot_longer()使列操作更容易，然后将其恢复到原来的宽格式pivot_wider()

test |>
  tidyr::pivot_longer(matches("^Games")) |>
  group_by(ID) |>
  mutate(Games = paste(na.omit(value), collapse = "; ")) |>
  ungroup() |>
  tidyr::pivot_wider() |>
  mutate(Games = sub("^$", NA, Games)) |>
  relocate(all_of(colnames(test)))

#> # A tibble: 3 × 6
#>   ID    Gender     Games_Chess Games_Clue Games_Scrabble Games                
#>   <chr> <chr>      <chr>       <chr>      <chr>          <chr>                
#> 1 1     Female     Chess       Clue       Scrabble       Chess; Clue; Scrabble
#> 2 2     Male       <NA>        <NA>       <NA>           <NA>                 
#> 3 3     Non-Binary Chess       <NA>       Scrabble       Chess; Scrabble

Chris · Answer 3 · 2024-12-20T10:38:14+08:00

Chris

2024-12-20T10:38:14+08:002024-12-20T10:38:14+08:00

这base可能是：

test$Games <- apply(test[, 3:5], 1, paste, collapse = '; ')
test$Games <- gsub('NA; ', '', test$Games) # partial cleanup of repeat NA
test$Games[which(test$Games == 'NA')] <- NA # set NA(s) if exist
test
  ID     Gender Games_Chess Games_Clue Games_Scrabble                 Games
1  1     Female       Chess       Clue       Scrabble Chess; Clue; Scrabble
2  2       Male        <NA>       <NA>           <NA>                  <NA>
3  3 Non-Binary       Chess       <NA>       Scrabble       Chess; Scrabble

0

ThomasIsCoding · Answer 4 · 2024-12-20T17:24:06+08:00

ThomasIsCoding

2024-12-20T17:24:06+08:002024-12-20T17:24:06+08:00

也许你可以尝试一下

test$Games <- apply(
  test[startsWith(names(test), "Games")],
  1,
  \(x) ifelse(all(is.na(x)), NA, toString(na.omit(x)))
)

使得

> test
# A tibble: 3 × 6
  ID    Gender     Games_Chess Games_Clue Games_Scrabble Games
  <chr> <chr>      <chr>       <chr>      <chr>          <chr>
1 1     Female     Chess       Clue       Scrabble       Chess, Clue, Scrabble
2 2     Male       NA          NA         NA             NA
3 3     Non-Binary Chess       NA         Scrabble       Chess, Scrabble

0

改变以连接名称中包含特定字符串的列

Vue 3：创建时出错“预期标识符但发现‘导入’”[重复]

为什么这个简单而小的 Java 代码在所有 Graal JVM 上的运行速度都快 30 倍，但在任何 Oracle JVM 上却不行？

具有指定基础类型但没有枚举器的“枚举类”的用途是什么？

如何修复未手动导入的模块的 MODULE_NOT_FOUND 错误？

`(表达式，左值) = 右值` 在 C 或 C++ 中是有效的赋值吗？为什么有些编译器会接受/拒绝它？

何时应使用 std::inplace_vector 而不是 std::vector？

在 C++ 中，一个不执行任何操作的空程序需要 204KB 的堆，但在 C 中则不需要

PowerBI 目前与 BigQuery 不兼容：Simba 驱动程序与 Windows 更新有关

AdMob：MobileAds.initialize() - 对于某些设备，“java.lang.Integer 无法转换为 java.lang.String”

我正在尝试仅使用海龟随机和数学模块来制作吃豆人游戏

改变以连接名称中包含特定字符串的列

4 个回答

相关问题