bill999提出的问题 -coding

bill999

Asked: 2025-02-25 12:42:11 +0800 CST

如何强制图形在某个 x 轴值之后停止显示

5

在 Stata 中，如何强制图形在 x 轴上的某个点后停止显示？

例如，假设我有：

sysuse auto2, clear
gen mid = (price + weight)/2
gen n = _n
    twoway ///
       (rcap price weight n, horizontal) ///    // 
       (scatter n mid),  ///
       ylabel(, nolabels noticks nogrid) ///     
       legend(off) ///
       xscale(range(0 7000)) ///
       xlabel(0(1000)7000)

在这个 MWE 中，我试图强制它在 7,000 之后停止显示，但不起作用。我想象xcale和xlabel可能会被覆盖，以便可以绘制所有rcap和元素。我该怎么做才能达到我想要的结果？scatter

bill999

Asked: 2024-12-01 00:35:59 +0800 CST

检查一个数据框的元素是否位于组内的另一个数据框中

9

假设我有这些数据：

library(dplyr)
df1 <- data.frame(x = c(1, 2, 3, 4), z = c("A", "A", "B", "B"))
df2 <- data.frame(x = c(2, 4, 6, 8), z = c("A", "A", "B", "C"))

我可以轻松检查x中的每个元素df1是否存在x于df2：

df1 <- df1 %>% mutate(present = x %in% df2$x)

有没有一种简单的方法来做同样的事情（最好在tidyverse），但只在组内检查？

换句话说，要使中的观测df1值为present，TRUE必须满足两点：1）中的组 ( z)df2必须与中的组相同；2）中的df1的值必须与中的值相同。xdf2df1

因此，只有第二个观测值 ( 2) 会是TRUE因为中存在一个观测值，其值为且为。最后一个观测值会是因为尽管中df2有x一个2值为z的A值x，FALSE但df2这个4观测值属于组A，而不是B。

bill999

Asked: 2024-09-13 05:45:50 +0800 CST

如何将字符串拆分成两部分（而不丢弃其他部分）

6

假设我有这些数据：

clear all
set obs 2
gen title = "dog - cat - horse" in 1
replace title = "chicken - frog - ladybug" in 2
tempfile data
save `data'

我可以将其分为三个部分：

use `data', clear
split title, p(" - ")

我可以将它们分成两部分，丢弃第三部分：

use `data', clear
split title, p(" - ") limit(2)

是否有现成的解决方案可以将其拆分成两个部分，但将第一个拆分字符 (在本例中为破折号) 之后的所有内容分组到第二个变量中？在 R 中，我将使用separate该extra="merge"选项 (请参阅tidyr 仅分离前 n 个实例)。

换句话说，对于第一行，我希望第一个观察结果是，title1并且dog结果title2是cat - horse。

我意识到使用自定义代码可以做到这一点（请参阅Stata 将字符串拆分为几部分），但我希望使用类似于 Stata split/R的简单命令separate来实现我的目标。

bill999

Asked: 2024-09-04 23:52:14 +0800 CST

如何在按行变异中引入滞后（睡眠）

8

我需要为 tibble 的每一行分别调用 API。如何在每次调用之间引入短暂的延迟？我需要这样做，因为我使用的 API 限制了每秒允许的请求数。

我该如何修改以下 (伪) 代码来实现此目的？该代码创建一个名为的列表列authors，该列由 (虚构的) API 调用的结果填充get_API_value。

library(tidyverse)
data %>% 
    rowwise() %>%
    mutate(authors = list(get_API_value(arg1 = val1, arg2 = val2)))

换句话说，我怎样才能使上述代码包含滞后（例如Sys.sleep(1)）？

bill999

Asked: 2024-09-04 09:56:08 +0800 CST

如何在与州界图相同的坐标系上获取经纬度坐标

5

我有一些坐标，例如：

library(tidyverse)
library(haven)
library(tidycensus)
library(tigris)

coords <- data.frame(lat = c(38.09720, 36.85298, 31.31517, 21.48344), long = c(-121.38785, -75.97742, -85.85522, -158.03648))

然后我得到一张美国地图：

geo <- get_acs(geography = "state",
               variables = c(x = "B04006_036"),
               year = 2021, 
               geometry = TRUE, 
               keep_geo_vars=TRUE) %>%
    filter(STATEFP!="72")

#to get alaska and hawaii in the picture
geo <- shift_geometry(geo)

然后我尝试绘制状态图，并叠加坐标：

ggplot(data = coords) +
    geom_point(aes(x=lat, y=long)) +
    geom_sf(fill = "transparent", color = "gray50", size = 1, data = geo %>% group_by(STATEFP) %>% summarise()) +   
    theme(panel.background = element_rect(fill = 'white')) +
    theme(panel.grid = element_blank(),axis.title = element_blank(),
          axis.text = element_blank(),axis.ticks = element_blank(),
          panel.border = element_blank())

得出的结果为：

但是，这不起作用，因为它生成的地图上所有坐标似乎都位于同一位置。我该如何修改以使地图和坐标处于同一比例？

bill999

Asked: 2024-08-31 10:04:37 +0800 CST

如何创建显示观察值百分位数范围的变量

5

说我有iris数据。

我知道我可以创建一个变量来显示属于某个百分位数的值：

library(tidyverse)
iris %>% mutate(Range = cut(Sepal.Length, quantile(Sepal.Length, probs=c(0,.2,.4,.6,.8,1)),include.lowest=TRUE))

得出的结果为：

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species   Range
1           4.3         3.0          1.1         0.1  setosa [4.3,4.6]
2           4.4         2.9          1.4         0.2  setosa [4.3,4.6]
3           4.6         3.1          1.5         0.2  setosa [4.3,4.6]
4           4.6         3.4          1.4         0.3  setosa [4.3,4.6]
5           4.7         3.2          1.3         0.2  setosa (4.6,4.8]
6           4.8         3.4          1.6         0.2  setosa (4.6,4.8]
7           4.8         3.0          1.4         0.1  setosa (4.6,4.8]
8           4.9         3.0          1.4         0.2  setosa   (4.8,5]
9           4.9         3.1          1.5         0.1  setosa   (4.8,5]
10          5.0         3.6          1.4         0.2  setosa   (4.8,5]
11          5.0         3.4          1.5         0.2  setosa   (4.8,5]
12          5.1         3.5          1.4         0.2  setosa   (5,5.4]
13          5.4         3.9          1.7         0.4  setosa   (5,5.4]
14          5.4         3.7          1.5         0.2  setosa   (5,5.4]
15          5.7         4.4          1.5         0.4  setosa (5.4,5.8]
16          5.8         4.0          1.2         0.2  setosa (5.4,5.8]

我如何才能创建另一个变量来显示观察结果所在的百分位数范围？我不想使用 ifelse 语句等手动创建变量，但希望有一个函数可以自动创建它。

我正在寻找可以生成如下表格的东西：

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species   Percent  Range
1           4.3         3.0          1.1         0.1  setosa [4.3,4.6]  [0,.2]
2           4.4         2.9          1.4         0.2  setosa [4.3,4.6]  [0,.2]
3           4.6         3.1          1.5         0.2  setosa [4.3,4.6]  [0,.2]
4           4.6         3.4          1.4         0.3  setosa [4.3,4.6]  [0,.2]
5           4.7         3.2          1.3         0.2  setosa (4.6,4.8]  (.2,.4]
6           4.8         3.4          1.6         0.2  setosa (4.6,4.8]  (.2,.4]
7           4.8         3.0          1.4         0.1  setosa (4.6,4.8]  (.2,.4]
8           4.9         3.0          1.4         0.2  setosa   (4.8,5]  (.4,.6]
9           4.9         3.1          1.5         0.1  setosa   (4.8,5]  (.4,.6]
10          5.0         3.6          1.4         0.2  setosa   (4.8,5]  (.4,.6]
11          5.0         3.4          1.5         0.2  setosa   (4.8,5]  (.4,.6]
12          5.1         3.5          1.4         0.2  setosa   (5,5.4]  (.6,.8]
13          5.4         3.9          1.7         0.4  setosa   (5,5.4]  (.6,.8]
14          5.4         3.7          1.5         0.2  setosa   (5,5.4]  (.6,.8]
15          5.7         4.4          1.5         0.4  setosa (5.4,5.8]  [.8,1]
16          5.8         4.0          1.2         0.2  setosa (5.4,5.8]  [.8,1]

bill999

Asked: 2024-08-31 02:59:13 +0800 CST

如何一次将多列应用于函数

5

我有一个包含两列的 tibble。对于每一行，我想在函数中使用这两列的值。使用执行此操作的正确方法是什么tidyverse？正如我将在下面更详细地描述的那样，我认为该函数（调用 API）不能矢量化。

为了提出想法，假设我有这些数据：

library(tidyverse)
d <- tibble(a=c("a", "b", "d"), b=c("x", "y", "z"))

然后我想应用一个函数（这里只是非常简单的事情）。使用基础 R，我可以做到：

for (i in 1:nrow(d)) {
    d[i, "value"] <- paste0(d[i,"a"], d[i, "b"])
}

最好的方法是什么（tidyverse希望找到解决方案，但不是必要的）对每一行执行此操作，并向函数传递两个参数？

请注意，我知道在上面的例子中，我可以这样做d <- d %>% mutate(value = paste0(a, b))，但我的实际问题涉及一个 R 函数，该函数对特定 API 执行 API 调用，我认为需要一次运行一行。每次调用 API 都会返回一个列表，我想将其存储在我的 tibble 中。

另请注意，每次 API 调用之间可能需要留出一点延迟。

bill999

Asked: 2024-08-27 06:19:41 +0800 CST

如何让图例框显示出来

6

假设我使用以下代码创建了 NC 的分级统计图：

library(ggplot2)
library(sf)
nc <- sf::st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE)
b <- ggplot(nc) +
    geom_sf(aes(fill = AREA))
b

图例看上去就像它本来的样子。

然后，我使用 Paul 对这个问题的回答中的代码（如何阴影形状，它本身基于无法复制这个 ggplot2 图）来创建ggrough（https://github.com/xvrdm/ggrough）版本的图。

#devtools::install_github("xvrdm/ggrough")
library(ggrough)
library(magrittr)

#In the popup window, paste this so that parse_rough will use parse_sf for GeomSf.
function (svg, geom) 
{
    rough_els <- list()
    if (geom %in% c("GeomCol", "GeomBar", "GeomTile", 
                    "Background")) {
        rough_els <- append(rough_els, parse_rects(svg))
    }
    if (geom %in% c("GeomArea", "GeomViolin", "GeomSmooth", 
                    "Background")) {
        rough_els <- append(rough_els, parse_areas(svg))
    }
    if (geom %in% c("GeomPoint", "GeomJitter", "GeomDotPlot", 
                    "Background")) {
        rough_els <- append(rough_els, parse_circles(svg))
    }
    if (geom %in% c("GeomLine", "GeomSmooth", "Background")) {
        rough_els <- append(rough_els, parse_lines(svg))
    }
    if (geom %in% c("Background")) {
        rough_els <- append(rough_els, parse_texts(svg))
    }
    if (geom %in% c("GeomSf")) {
        rough_els <- append(rough_els, parse_sf(svg))
    }
    purrr::map(rough_els, ~purrr::list_modify(.x, geom = geom))
}

parse_sf <- function (svg) {
    shape <- "path"
    keys <- NULL
    ggrough:::parse_shape(svg, shape, keys) %>% {
        purrr::map(., 
                   ~purrr::list_modify(.x, 
                                    points = .x$d, 
                                    shape = "path"
                   ))
    }
}

nc <- sf::st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE)
b <- ggplot(nc) +
    geom_sf(aes(fill = AREA))
b


options <- list(GeomSf=list(fill_style="hachure", 
                            angle_noise=0.5,
                            gap_noise=0.2,
                            gap=1.5,
                            fill_weight=1))
get_rough_chart(b, options)

得出的结果为：

但是现在图例的颜色框部分不见了，有没有什么办法可以显示出来呢？

请注意，在某些情况下，图例确实有效。我不确定它是离散的还是连续的，但ggroughZ.Lin 对此问题的回答中这个离散颜色图中的图例有效（无法复制此 ggplot2 图）。

bill999

Asked: 2024-08-27 05:06:32 +0800 CST

如何精确叠加两个图

5

作为起点，我使用 Kat 对这个问题的回答（如何使地图边框的粗糙度小于地图填充的粗糙度）中非常有用的代码来创建两个图表，目的是将其中一个图表放在另一个图表之上。

library(magrittr)
library(ggplot2)
#devtools::install_github("xvrdm/ggrough")
library(ggrough)
library(sf)
library(htmltools)    
library(ggiraph)      

trace(ggrough:::parse_rough, edit=TRUE)
#In the popup window, paste this so that parse_rough will use parse_sf for GeomSf.
function (svg, geom) 
{
  rough_els <- list()
  if (geom %in% c("GeomCol", "GeomBar", "GeomTile", 
                  "Background")) {
    rough_els <- append(rough_els, parse_rects(svg))
  }
  if (geom %in% c("GeomArea", "GeomViolin", "GeomSmooth", 
                  "Background")) {
    rough_els <- append(rough_els, parse_areas(svg))
  }
  if (geom %in% c("GeomPoint", "GeomJitter", "GeomDotPlot", 
                  "Background")) {
    rough_els <- append(rough_els, parse_circles(svg))
  }
  if (geom %in% c("GeomLine", "GeomSmooth", "Background")) {
    rough_els <- append(rough_els, parse_lines(svg))
  }
  if (geom %in% c("Background")) {
    rough_els <- append(rough_els, parse_texts(svg))
  }
  if (geom %in% c("GeomSf")) {
    rough_els <- append(rough_els, parse_sf(svg))
  }
  purrr::map(rough_els, ~purrr::list_modify(.x, geom = geom))
}

# Create the function parse_sf.
parse_sf <- function (svg) {
  shape <- "path"
  keys <- NULL
  ggrough:::parse_shape(svg, shape, keys) %>% {
    purrr::map(., 
               ~purrr::list_modify(.x, 
                                   points = .x$d, 
                                   shape = "path"
               ))
  }
}
nc <- sf::st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE)

b <- ggplot(nc) + geom_sf(color = "black") + theme_minimal() +
  theme(panel.grid = element_line(color = NA),  # not resized or removed! (keep spacing)
        axis.text = element_text(color = NA))

options <- list(GeomSf = list(fill_style = "hachure", angle = 60, angle_noise = 1,
                              gap_noise = 0, gap = 6, fill_weight = 2, bowing = 5,
                              roughness = 30))

(xx <- get_rough_chart(b, options))  # from your question
fixer <- function(ggr) {          # where ggr is the ggrough graph
  nd <- lapply(1:length(ggr$x$data), function(j) {
    if(!is.null(ggr$x$data[[j]]$lengthAdjust)) { # if a text element (axis label)
      ggr$x$data[[j]]$content <- ""              # remove text, but keep spacing
      ggr$x$data[[j]]                            # return modified data element
    } else {
      ggr$x$data[[j]]                            # not text, return orig data
    }
  })
  ggr$x$data <- nd                               # add mod data to graph
  ggr                                            # return mod graph
}
xx2 <- xx %>% fixer()  # modify the plot, to hide text

(g2 <- ggplot(nc) +
    geom_sf(fill = "transparent", color = "black", linewidth = 2) +
    theme_minimal() +
    theme(plot.background = element_rect(fill = NA, color = "transparent"), # no white background
          panel.background = element_rect(fill = NA, color = "transparent"),
          text = element_text(size = 9)))      # text size to match defaults in ggrough

gg <- girafe(ggobj = g2, width_svg = 7, height_svg = 5)  # h/w default w/ ggrough

browsable(div( # parent div, size matches ggrough's default
  style = css(width = "960px", height = "500px", position = "relative"),
  div(xx2, style = css(display = "block")),                           # ggrough graph
  div(gg, style = css(position = "absolute", top = 0, padding.top = "54.2px", # layer behind
                      width = "610px", height = "500px", z.index = -2))
              )) # size and padding found by trial and error with defaults for graph sizes

答案中的图表似乎已正确叠加。但是，当我运行相同的代码（在 RStudio 中）时，我得到的叠加是不正确的：

我也在另一台计算机上的 RStudio 上尝试过，也得到了不完美的叠加，但程度不同。

我有两个问题：

如何才能正确地叠加图形，而不必反复试验？
我该如何调整代码，以便将当前背景切换到前景？换句话说，我该如何让黑色边框位于灰色涂鸦之上，而不是相反？我尝试div在结尾处交换两个边框的顺序，但没有成功。

bill999

Asked: 2024-08-08 23:58:30 +0800 CST

使用 rvest，如何选择仅包含精确文本的 div 类

6

假设我抓取了如下代码：

library(rvest)
library(dplyr)

test <- minimal_html('
  <div class="entry">
        <div class="book">
          <div class="booktitle">Book 1</div>
          <div class="year">1991</div>
        </div>
        <div class="book dont-use">
          <div class="booktitle">Book 2</div>
          <div class="year">1979</div>
        </div>
        <div class="book">
          <div class="booktitle">Book 3</div>
          <div class="year">1399</div>
        </div>
        <div class="book dont-use">
          <div class="booktitle">Book 4</div>
          <div class="year">1949</div>
        </div>        
  </div>')

要选择book其类别中包含的所有内容，我可以使用：

test %>% html_elements(".book")

这将返回所有四个对象。

但是，我不想选择类为的第二和第四个条目book dont-use。我怎样才能只选择第一和第三个条目？换句话说，我怎样才能修改代码以仅精确选择book？

bill999

Asked: 2024-08-08 03:16:09 +0800 CST

如何从 rvest 抓取的网站创建数据框，保留数据的嵌套结构

6

假设我read_html_live()从rvest包中提取了一些如下所示的代码：

books <- minimal_html('
  <div>
    <div class="book">
      <div class="booktitle">Book 1</div>
      <div class="year">1999</div>
      <div class="author">Author 1</div>
      <div class="author">Author 2</div>
      <div class="author">Author 3</div>
    </div>
    <div class="book">
      <div class="booktitle">Book 2</div>
      <div class="year">2022</div>
      <div class="author">Author 4</div>
    </div>
    <div class="book">
      <div class="booktitle">Book 3</div>
      <div class="year">1845</div>
      <div class="author">Author 5</div>
      <div class="author">Author 6</div>
      <div class="author">Author 7</div>
      <div class="author">Author 8</div>
    </div>    
  </div>')

我想使用该rvest包创建一个包含上述信息的数据框（或 tibble 也可以）。我希望它按作者级别进行组织，这样每行将包含作者、书名和年份。

如果我只关心第一作者，那就简单了。例如：

data0 <- books %>% html_elements(".book")
title <- data0 %>% html_element(".booktitle") %>% html_text2()
year <- data0 %>% html_element(".year") %>% html_text2()
author1 <- data0 %>% html_element("author") %>% html_text2()
data <- data.frame(title, year, author1)

但是，我实际上想提取所有作者，作者是书中的“子作者”。数据框现在将有八行，每个作者一行。例如，第 8 行将有Book 3、1845和Author 8。我该怎么做？

这是一个粗略的想法，但我正在寻找更简单的解决方案：

data0 <- books %>% html_elements(".book") 
title <- data0 %>% html_element(".booktitle") %>% html_text2()
year <- data0 %>% html_element(".year") %>% html_text2()

authors <- data0 %>% html_element(".author")

然后循环遍历作者的三个元素，并将它们分别保存到数据框中。然后将每个作者数据框与相关标题和年份关联起来，并以某种方式将其转换为长数据框。

bill999

Asked: 2024-02-11 11:15:24 +0800 CST

使用 alpha 时如何让 geom_curve 在箭头中显示恒定的透明度

9

我的目标是在地图上创建曲线（连接点），一侧有一个箭头。这些线将是半透明的。似乎实现我的目标的最佳方法是使用geom_curve（尽管其他ggplot2解决方案也很棒）。

因此，假设我有这个玩具情节：

library(tidyverse)
library(ggplot2)
data <- data.frame(x = 4, y = 20, xend = 7, yend = 15)

ggplot(data) + geom_curve(aes(x = x, y = y, xend = xend, yend = yend),
    arrow = arrow(length = unit(0.17, "npc"), type="closed", angle=20),
    colour = "red",
    linewidth = 5,
    angle = 90, 
    alpha=.2,
    lineend = "butt",
    curvature = -0.4,
)

这将创建：

我面临的问题是，当使用geom_curve半透明线时，当我希望箭头保持一致时，箭头会显示不同级别的透明度。我怎样才能防止这种情况发生？

这个问题与以下问题密切相关：阿尔法美学显示箭头的骨架而不是简单的形状 - 如何防止它？，只不过它使用geom_curve而不是geom_segment. 精彩的答案（https://stackoverflow.com/a/60587296/2049545）定义了一个新的几何形状（geom_arrowbar）。是否可以修改它以用于geom_curve？我还将标记另一个指向geom_gene_arrow（https://stackoverflow.com/a/60655018/2049545）的答案 - 这可以与曲线一起使用吗？或者还有其他可行的解决方案吗？

bill999

Asked: 2024-02-08 00:46:45 +0800 CST

如何制作带有指示流动方向的箭头、不同粗细的线条以及 A 到 B 和 B 到 A 流动不重叠的流程图

5

我对创建一个如下所示的地图很感兴趣（可在https://www.axios.com/2017/12/15/the-flow-of-goods- Between-states-1513304375 找到）：

具体来说，我想用曲线来描绘地图上区域之间的流动，并用更宽的线指示更大的流动，并使用箭头来显示流动的方向。如果可能的话，我还希望从 A 到 B 的线不要位于从 B 到 A 的线的顶部，以便观看者区分两者。最好使用ggplot2，尽管我对其他解决方案持开放态度。

我会注意到有一些相关的问题（例如How can I add Direction arrows to Lines画在R中的地图上？，如何在R中创建带有方向箭头的地图图表？，使用R在地图中绘制电子邮件流，以及https ://flowingdata.com/2011/05/11/how-to-map-connections-with-great-circles/），但我想知道是否有一个解决方案可以让我一次合并所有元素。（我不确定先前的解决方案是否解决了 A 到 B 以及 B 到 A 不重叠的问题。）

如何强制图形在某个 x 轴值之后停止显示

检查一个数据框的元素是否位于组内的另一个数据框中

如何将字符串拆分成两部分（而不丢弃其他部分）

如何在按行变异中引入滞后（睡眠）

如何在与州界图相同的坐标系上获取经纬度坐标

如何创建显示观察值百分位数范围的变量

如何一次将多列应用于函数

如何让图例框显示出来

如何精确叠加两个图

使用 rvest，如何选择仅包含精确文本的 div 类

如何从 rvest 抓取的网站创建数据框，保留数据的嵌套结构

使用 alpha 时如何让 geom_curve 在箭头中显示恒定的透明度

如何制作带有指示流动方向的箭头、不同粗细的线条以及 A 到 B 和 B 到 A 流动不重叠的流程图

重新格式化数字，在固定位置插入分隔符

为什么 C++20 概念会导致循环约束错误，而老式的 SFINAE 不会？

VScode 自动卸载扩展的问题（Material 主题）

Vue 3：创建时出错“预期标识符但发现‘导入’”[重复]

具有指定基础类型但没有枚举器的“枚举类”的用途是什么？

如何修复未手动导入的模块的 MODULE_NOT_FOUND 错误？

`(表达式，左值) = 右值` 在 C 或 C++ 中是有效的赋值吗？为什么有些编译器会接受/拒绝它？

在 C++ 中，一个不执行任何操作的空程序需要 204KB 的堆，但在 C 中则不需要

PowerBI 目前与 BigQuery 不兼容：Simba 驱动程序与 Windows 更新有关

AdMob：MobileAds.initialize() - 对于某些设备，“java.lang.Integer 无法转换为 java.lang.String”

bill999's questions