我在R工作。
我有一些关于学校教职员工的数据:
data <- data.frame(person_id = c(1, 2, 3, 4, 5, 6, 7, 8),
disability_status = c("yes", "no", "yes", "no", "yes", "no", "yes", "no"),
age_group = c("20-30","30-40","20-30","30-40","20-30","30-40","20-30","30-40"),
teacher = c("yes", "no", "no", "yes", "no","yes", "no", "yes" ))
我编写了一个函数,可以对插入其中的变量进行求和。“group_tag”参数是为了帮助稍后在我的代码中进行调试。
group_the_data <- function(data,
variable,
group_tag) {
grouped_output <- data %>%
mutate(flag = 1) %>%
group_by({{variable}}) %>%
summarise(number_staff = sum(flag, na.rm = T)) %>%
mutate(grouping_tag := {{group_tag}})
return(grouped_output)
}
然后,我使用该函数依次按残障状态、年龄组和教师进行分组:
disability_grouped <- group_the_data(data = data,
variable = disability_status,
group_tag = "disability status")
age_group_grouped <- group_the_data(data = data,
variable = age_group,
group_tag = "age group")
role_grouped <- group_the_data(data = data,
variable = teacher,
group_tag = "role")
一旦我有了所需的数据框,我就把它们绑定在一起:
all_data_grouped <- bind_rows(disability_grouped, age_group_grouped, role_grouped)
有没有办法循环访问变量,这样我就不需要将函数写三次?
或者使用Apply 函数之一是更好的主意吗?
您可以使用
lapply
或purrr::map
来迭代变量。为此,我们需要循环遍历字符串而不是变量,因此您pick
需要group_by
.同样,
purrr::map2
如果您想拥有不同的“变量”和“group_tag”,请使用: