hans jablonsky提出的问题 -coding

Alireza Sadeghi

Asked: 2024-09-11 21:03:40 +0800 CST

如何使用 rvest 将分层网络数据抓取为表格格式？

9

我通常熟悉。我知道和rvest之间的区别。但我无法解决这个问题：html_elements()html_element()

假设我们有类似此网页上的数据。数据采用分层格式，每个标题都有不同数量的子标题。

当我尝试抓取时，我得到了 177 个标题。但是，副标题实际上有 270 个。我想将数据提取成整齐的格式。但由于向量大小不同，我无法轻松地将它们组合成 tibble。

这是我的代码以及一些关于结果的评论：

page <- read_html("https://postdocs.stanford.edu/about/department-postdoc-admins")

person_departments <- page %>% 
    html_elements(".item-list") %>% 
    html_element("h3") %>% 
    html_text2()
# The above code returns 

person_names <- page %>% 
  html_elements(".item-list li") %>% 
  html_element("h4") %>% 
  html_text2()
# This one returns 270 names (some departments have more than 1 admin)

# Using the above codes, I can't get a nice table with two columns, one for the name and one for the person's department.

Alireza Sadeghi

Asked: 2024-08-03 21:20:33 +0800 CST

如何使用函数和“across()”将条件列转变为“tibble”？

6

为了演示目的，我使用了一个tidytuesday名为的数据集animal_outcomes。

我的问题：我在 a 中有几个数字列tibble。我想要mutate一个新列，它将所有列（最后一列除外）相加，如果总和等于最后一列，则新列为 1 ，否则为 0。我将进一步解释：

# Adding the example dataset
data <- tidytuesdayR::tt_load(x = "2020-07-21")
data <- data$animal_outcomes

现在数据是这样的：

> data$animal_outcomes

# A tibble: 664 × 12
    year animal_type outcome      ACT   NSW    NT   QLD    SA   TAS   VIC    WA Total
   <dbl> <chr>       <chr>      <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1  1999 Dogs        Reclaimed    610  3140   205  1392  2329   516  7130     1 15323
 2  1999 Dogs        Rehomed     1245  7525   526  5489  1105   480  4908   137 21415
 3  1999 Dogs        Other         12   745   955   860   380   168  1001     6  4127
 4  1999 Dogs        Euthanized   360  9221     9  9214  1701   599  5217    18 26339
 5  1999 Cats        Reclaimed    111   201    22   206   157    31   884     0  1612
 6  1999 Cats        Rehomed     1442  3913   269  3901  1055   752  3768    62 15162
 7  1999 Cats        Other          0   447     0   386    46   124  1501     5  2509
 8  1999 Cats        Euthanized  1007  8205   847 10554  3415  1056  6113     5 31202
 9  1999 Horses      Reclaimed      0     0     1     0     2     1    87     0    91
10  1999 Horses      Rehomed        1    12     3     3    10     0    19     0    48
# ℹ 654 more rows
# ℹ Use `print(n = ...)` to see more rows

我想添加一个列来检查该Total列是否确实是所有列的总和。这是我脑海中的结果：

# A tibble: 664 × 13
    year animal_type outcome      ACT   NSW    NT   QLD    SA   TAS   VIC    WA Total condition # notice this last column
   <dbl> <chr>       <chr>      <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>     <dbl>
 1  1999 Dogs        Reclaimed    610  3140   205  1392  2329   516  7130     1 15323         1
 2  1999 Dogs        Rehomed     1245  7525   526  5489  1105   480  4908   137 21415         1
 3  1999 Dogs        Other         12   745   955   860   380   168  1001     6  4127         1
 4  1999 Dogs        Euthanized   360  9221     9  9214  1701   599  5217    18 26339         1
 5  1999 Cats        Reclaimed    111   201    22   206   157    31   884     0  1612         1
 6  1999 Cats        Rehomed     1442  3913   269  3901  1055   752  3768    62 15162         1
 7  1999 Cats        Other          0   447     0   386    46   124  1501     5  2509         1
 8  1999 Cats        Euthanized  1007  8205   847 10554  3415  1056  6113     5 31202         1
 9  1999 Horses      Reclaimed      0     0     1     0     2     1    87     0    91         1
10  1999 Horses      Rehomed        1    12     3     3    10     0    19     0    48         1
# ℹ 654 more rows
# ℹ Use `print(n = ...)` to see more rows

我尝试了以下代码。它可以工作，但需要大量击键，因此，如果您有很多列，它将无法很好地工作：

> data$animal_outcomes %>% 
    mutate(condition = if_else((ACT + NSW + NT + QLD + SA + TAS + VIC + WA) == Total, 1, 0))

# A tibble: 664 × 13
    year animal_type outcome      ACT   NSW    NT   QLD    SA   TAS   VIC    WA Total condition
   <dbl> <chr>       <chr>      <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>     <dbl>
 1  1999 Dogs        Reclaimed    610  3140   205  1392  2329   516  7130     1 15323         1
 2  1999 Dogs        Rehomed     1245  7525   526  5489  1105   480  4908   137 21415         1
 3  1999 Dogs        Other         12   745   955   860   380   168  1001     6  4127         1
 4  1999 Dogs        Euthanized   360  9221     9  9214  1701   599  5217    18 26339         1
 5  1999 Cats        Reclaimed    111   201    22   206   157    31   884     0  1612         1
 6  1999 Cats        Rehomed     1442  3913   269  3901  1055   752  3768    62 15162         1
 7  1999 Cats        Other          0   447     0   386    46   124  1501     5  2509         1
 8  1999 Cats        Euthanized  1007  8205   847 10554  3415  1056  6113     5 31202         1
 9  1999 Horses      Reclaimed      0     0     1     0     2     1    87     0    91         1
10  1999 Horses      Rehomed        1    12     3     3    10     0    19     0    48         1
# ℹ 654 more rows
# ℹ Use `print(n = ...)` to see more rows

我也使用了以下但它返回了错误：

data$animal_outcomes %>% 
    mutate(condition = if_else((ACT + NSW + NT + QLD + SA + TAS + VIC + WA) == Total, 1, 0))

另外，这个（显然是错误的，因为它对实际数字进行了总结4:11）：

data$animal_outcomes %>% 
    mutate(condition = if_else(sum(4:11) == Total, 1,0))

还有这个：我不确定为什么sum(ACT:WA)不返回错误！如果没有返回错误，它实际上是在求和什么！！

data$animal_outcomes %>% 
    mutate(condition = if_else(sum(ACT:WA) == Total, 1,0))

# A tibble: 664 × 13
    year animal_type outcome      ACT   NSW    NT   QLD    SA   TAS   VIC    WA Total condition
   <dbl> <chr>       <chr>      <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>     <dbl>
 1  1999 Dogs        Reclaimed    610  3140   205  1392  2329   516  7130     1 15323         0
 2  1999 Dogs        Rehomed     1245  7525   526  5489  1105   480  4908   137 21415         0
 3  1999 Dogs        Other         12   745   955   860   380   168  1001     6  4127         0
 4  1999 Dogs        Euthanized   360  9221     9  9214  1701   599  5217    18 26339         0
 5  1999 Cats        Reclaimed    111   201    22   206   157    31   884     0  1612         0
 6  1999 Cats        Rehomed     1442  3913   269  3901  1055   752  3768    62 15162         0
 7  1999 Cats        Other          0   447     0   386    46   124  1501     5  2509         0
 8  1999 Cats        Euthanized  1007  8205   847 10554  3415  1056  6113     5 31202         0
 9  1999 Horses      Reclaimed      0     0     1     0     2     1    87     0    91         0
10  1999 Horses      Rehomed        1    12     3     3    10     0    19     0    48         0

hans jablonsky

Asked: 2024-04-29 09:13:17 +0800 CST

有没有办法将paste0函数与列表组合？

7

我有一个名字向量：

names <- c("a", "b", "c", "d", "e", "f", "g", "h")

我想分别将这些名称与这些数字进行一系列组合：

numbers <- c(0, 0:2, 0:3, 0:1, 0:1, 0, 0, 0)

所以我的组合是：

a0 b0 b1 b2 c0 c1 c2 c3 d0 d1 e0 e1 f0 g0 h0

我该怎么做？我宁愿使用向量化函数而不是迭代。

我尝试了这些代码，但得到了错误的结果（可能是由于回收或其他原因）：

paste0(names, numbers, recycle0 = F)
paste0(names, numbers, recycle0 = T)
paste0(names, numbers |> as.list())
purrr::map_chr(names, paste0(c(0, 0:2, 0:3, 0:1, 0:1, 0, 0, 0)))

还有一些我记不太清楚了。