AskOverflow.Dev

AskOverflow.Dev Logo AskOverflow.Dev Logo

AskOverflow.Dev Navigation

  • 主页
  • 系统&网络
  • Ubuntu
  • Unix
  • DBA
  • Computer
  • Coding
  • LangChain

Mobile menu

Close
  • 主页
  • 系统&网络
    • 最新
    • 热门
    • 标签
  • Ubuntu
    • 最新
    • 热门
    • 标签
  • Unix
    • 最新
    • 标签
  • DBA
    • 最新
    • 标签
  • Computer
    • 最新
    • 标签
  • Coding
    • 最新
    • 标签
主页 / coding / 问题 / 79593838
Accepted
Emman
Emman
Asked: 2025-04-26 18:45:19 +0800 CST2025-04-26 18:45:19 +0800 CST 2025-04-26 18:45:19 +0800 CST

如何创建一个现在列来表示某个值是否位于由其他列分组的先前行中?

  • 772

给定一个长格式表,其中包含两个“组”列,我想创建一个新列,该新列具有前一个TRUE组的集合中是否存在该值。

例子

请考虑下表,其中显示了两个人以及他们每天购买的东西。

df_groceries <- tibble::tribble(
   ~person,  ~day,          ~groceries,
    "gary", "Mon",          "tomatoes",
    "gary", "Mon",              "milk",
    "gary", "Mon",             "bread",
    "gary", "Mon",            "yogurt",
    "gary", "Tue",              "eggs",
    "gary", "Tue",            "cheese",
    "gary", "Tue",            "apples",
    "gary", "Wed",           "chicken",
    "gary", "Wed",              "rice",
    "gary", "Wed",            "apples",
    "gary", "Thu",           "lettuce",
    "gary", "Thu",             "sauce",
    "gary", "Fri",              "fish",
    "gary", "Fri",          "potatoes",
    "gary", "Fri",           "lettuce",
    "gary", "Sat",            "cereal",
    "gary", "Sat",           "bananas",
    "gary", "Sat",             "juice",
    "gary", "Sun",              "rice",
    "gary", "Sun",           "bananas",
    "gary", "Sun",            "cereal",
  "rachel", "Mon",           "spinach",
  "rachel", "Mon",         "mushrooms",
  "rachel", "Mon",             "pasta",
  "rachel", "Tue",         "mushrooms",
  "rachel", "Tue",          "broccoli",
  "rachel", "Tue",            "lemons",
  "rachel", "Tue",         "olive oil",
  "rachel", "Wed",          "avocados",
  "rachel", "Wed",            "lemons",
  "rachel", "Thu",    "chicken breast",
  "rachel", "Thu",            "quinoa",
  "rachel", "Thu",      "bell peppers",
  "rachel", "Fri",            "yogurt",
  "rachel", "Fri",           "berries",
  "rachel", "Fri",           "granola",
  "rachel", "Sat",            "yogurt",
  "rachel", "Sat",          "avocados",
  "rachel", "Sun",              "eggs",
  "rachel", "Sun",      "orange juice",
  "rachel", "Sun", "whole wheat bread"
  )

我想计算一个额外的列,以指示每件杂货是否是在前一天(特别是前一天,而不是任何前一天)购买的,并区分每个人。

例如,由于加里在星期二和星期三都得到了苹果,那么我们应该TRUE为加里标记星期三的苹果。

因此,所需的输出是:

df_groceries_desired_output <- 
  tibble::tribble(
   ~person,  ~day,          ~groceries, ~was_purchased_yesterday,
    "gary", "Mon",          "tomatoes",                      NA,
    "gary", "Mon",              "milk",                      NA,
    "gary", "Mon",             "bread",                      NA,
    "gary", "Mon",            "yogurt",                      NA,
    "gary", "Tue",              "eggs",                   FALSE,
    "gary", "Tue",            "cheese",                   FALSE,
    "gary", "Tue",            "apples",                   FALSE,
    "gary", "Wed",           "chicken",                   FALSE,
    "gary", "Wed",              "rice",                   FALSE,
    "gary", "Wed",            "apples",                    TRUE,
    "gary", "Thu",           "lettuce",                   FALSE,
    "gary", "Thu",             "sauce",                   FALSE,
    "gary", "Fri",              "fish",                   FALSE,
    "gary", "Fri",          "potatoes",                   FALSE,
    "gary", "Fri",           "lettuce",                    TRUE,
    "gary", "Sat",            "cereal",                   FALSE,
    "gary", "Sat",           "bananas",                   FALSE,
    "gary", "Sat",             "juice",                   FALSE,
    "gary", "Sun",              "rice",                   FALSE,
    "gary", "Sun",           "bananas",                    TRUE,
    "gary", "Sun",            "cereal",                    TRUE,
  "rachel", "Mon",           "spinach",                      NA,
  "rachel", "Mon",         "mushrooms",                      NA,
  "rachel", "Mon",             "pasta",                      NA,
  "rachel", "Tue",         "mushrooms",                    TRUE,
  "rachel", "Tue",          "broccoli",                   FALSE,
  "rachel", "Tue",            "lemons",                   FALSE,
  "rachel", "Tue",         "olive oil",                   FALSE,
  "rachel", "Wed",          "avocados",                   FALSE,
  "rachel", "Wed",            "lemons",                    TRUE,
  "rachel", "Thu",    "chicken breast",                   FALSE,
  "rachel", "Thu",            "quinoa",                   FALSE,
  "rachel", "Thu",      "bell peppers",                   FALSE,
  "rachel", "Fri",            "yogurt",                   FALSE,
  "rachel", "Fri",           "berries",                   FALSE,
  "rachel", "Fri",           "granola",                   FALSE,
  "rachel", "Sat",            "yogurt",                    TRUE,
  "rachel", "Sat",          "avocados",                   FALSE,
  "rachel", "Sun",              "eggs",                   FALSE,
  "rachel", "Sun",      "orange juice",                   FALSE,
  "rachel", "Sun", "whole wheat bread",                   FALSE
  )

我的尝试

我认为这应该像使用%in%运算符一样简单:

library(dplyr)

df_groceries |> 
  group_by(person) |> 
  mutate(day_as_number = case_match(day, 
                                    "Mon" ~ 1, 
                                    "Tue" ~ 2, 
                                    "Wed" ~ 3, 
                                    "Thu" ~ 4, 
                                    "Fri" ~ 5, 
                                    "Sat" ~ 6, 
                                    "Sun" ~ 7)) |> 
  mutate(was_purchased_yesterday = groceries %in% groceries[day_as_number == day_as_number - 1])

但我得到了毫无意义的结果:

df_groceries

## # A tibble: 41 × 5
## # Groups:   person [2]
##    person day   groceries day_as_number was_purchased_yesterday
##    <chr>  <chr> <chr>             <dbl> <lgl>                  
##  1 gary   Mon   tomatoes              1 FALSE                  
##  2 gary   Mon   milk                  1 TRUE                   
##  3 gary   Mon   bread                 1 TRUE                   
##  4 gary   Mon   yogurt                1 TRUE                   
##  5 gary   Tue   eggs                  2 FALSE                  
##  6 gary   Tue   cheese                2 TRUE                   
##  7 gary   Tue   apples                2 TRUE                   
##  8 gary   Wed   chicken               3 FALSE                  
##  9 gary   Wed   rice                  3 TRUE                   
## 10 gary   Wed   apples                3 TRUE                   
## # ℹ 31 more rows
## # ℹ Use `print(n = ...)` to see more rows
  • 1 1 个回答
  • 128 Views

1 个回答

  • Voted
  1. Best Answer
    r2evans
    2025-04-26T18:59:48+08:002025-04-26T18:59:48+08:00

    我们可以对 和 本身进行连接df_groceries,将日期更改为前一天;任何匹配的都是重复,其他的都不是。我添加了特殊逻辑来排除"Mon"下一个匹配的情况"Sun",不过如果你能使用实际日期而不是滚动日期,处理起来会更好。

    library(dplyr)
    PriorDays <- c(Mon="Tue", Tue="Wed", Wed="Thu", Thu="Fri", Fri="Sat", Sat="Sun")
    df_groceries |>
      mutate(priorday = PriorDays[day]) |>
      left_join(df_groceries, y = _, by = c("person", day = "priorday", "groceries")) |>
      mutate(repeated = if_else(day == "Mon", NA, !is.na(day.y)), day.y = NULL)
    # # A tibble: 41 × 4
    #    person day   groceries repeated
    #    <chr>  <chr> <chr>     <lgl>   
    #  1 gary   Mon   tomatoes  NA      
    #  2 gary   Mon   milk      NA      
    #  3 gary   Mon   bread     NA      
    #  4 gary   Mon   yogurt    NA      
    #  5 gary   Tue   eggs      FALSE   
    #  6 gary   Tue   cheese    FALSE   
    #  7 gary   Tue   apples    FALSE   
    #  8 gary   Wed   chicken   FALSE   
    #  9 gary   Wed   rice      FALSE   
    # 10 gary   Wed   apples    TRUE    
    # # ℹ 31 more rows
    # # ℹ Use `print(n = ...)` to see more rows
    

    (已编辑:已修复以扭转PriorDays关系。)

    • 5

相关问题

  • 将复制活动的序列号添加到 Blob

  • Packer 动态源重复工件

  • 选择每组连续 1 的行

  • 图形 API 调用列表 subscribedSkus 状态权限不足,但已授予权限

  • 根据列值创建单独的 DF 的函数

Sidebar

Stats

  • 问题 205573
  • 回答 270741
  • 最佳答案 135370
  • 用户 68524
  • 热门
  • 回答
  • Marko Smith

    重新格式化数字,在固定位置插入分隔符

    • 6 个回答
  • Marko Smith

    为什么 C++20 概念会导致循环约束错误,而老式的 SFINAE 不会?

    • 2 个回答
  • Marko Smith

    VScode 自动卸载扩展的问题(Material 主题)

    • 2 个回答
  • Marko Smith

    Vue 3:创建时出错“预期标识符但发现‘导入’”[重复]

    • 1 个回答
  • Marko Smith

    具有指定基础类型但没有枚举器的“枚举类”的用途是什么?

    • 1 个回答
  • Marko Smith

    如何修复未手动导入的模块的 MODULE_NOT_FOUND 错误?

    • 6 个回答
  • Marko Smith

    `(表达式,左值) = 右值` 在 C 或 C++ 中是有效的赋值吗?为什么有些编译器会接受/拒绝它?

    • 3 个回答
  • Marko Smith

    在 C++ 中,一个不执行任何操作的空程序需要 204KB 的堆,但在 C 中则不需要

    • 1 个回答
  • Marko Smith

    PowerBI 目前与 BigQuery 不兼容:Simba 驱动程序与 Windows 更新有关

    • 2 个回答
  • Marko Smith

    AdMob:MobileAds.initialize() - 对于某些设备,“java.lang.Integer 无法转换为 java.lang.String”

    • 1 个回答
  • Martin Hope
    Fantastic Mr Fox msvc std::vector 实现中仅不接受可复制类型 2025-04-23 06:40:49 +0800 CST
  • Martin Hope
    Howard Hinnant 使用 chrono 查找下一个工作日 2025-04-21 08:30:25 +0800 CST
  • Martin Hope
    Fedor 构造函数的成员初始化程序可以包含另一个成员的初始化吗? 2025-04-15 01:01:44 +0800 CST
  • Martin Hope
    Petr Filipský 为什么 C++20 概念会导致循环约束错误,而老式的 SFINAE 不会? 2025-03-23 21:39:40 +0800 CST
  • Martin Hope
    Catskul C++20 是否进行了更改,允许从已知绑定数组“type(&)[N]”转换为未知绑定数组“type(&)[]”? 2025-03-04 06:57:53 +0800 CST
  • Martin Hope
    Stefan Pochmann 为什么 {2,3,10} 和 {x,3,10} (x=2) 的顺序不同? 2025-01-13 23:24:07 +0800 CST
  • Martin Hope
    Chad Feller 在 5.2 版中,bash 条件语句中的 [[ .. ]] 中的分号现在是可选的吗? 2024-10-21 05:50:33 +0800 CST
  • Martin Hope
    Wrench 为什么双破折号 (--) 会导致此 MariaDB 子句评估为 true? 2024-05-05 13:37:20 +0800 CST
  • Martin Hope
    Waket Zheng 为什么 `dict(id=1, **{'id': 2})` 有时会引发 `KeyError: 'id'` 而不是 TypeError? 2024-05-04 14:19:19 +0800 CST
  • Martin Hope
    user924 AdMob:MobileAds.initialize() - 对于某些设备,“java.lang.Integer 无法转换为 java.lang.String” 2024-03-20 03:12:31 +0800 CST

热门标签

python javascript c++ c# java typescript sql reactjs html

Explore

  • 主页
  • 问题
    • 最新
    • 热门
  • 标签
  • 帮助

Footer

AskOverflow.Dev

关于我们

  • 关于我们
  • 联系我们

Legal Stuff

  • Privacy Policy

Language

  • Pt
  • Server
  • Unix

© 2023 AskOverflow.DEV All Rights Reserve