我在这里看到了这个问题https://math.stackexchange.com/questions/2648895/why-does-fair-random-process-lead-to-unfair-result/5001301#5001301,其中有人对模拟两个玩家随机互相给钱的游戏很感兴趣。
我尝试使用 R 模拟该游戏的多种轨迹,以查看不同指标的分布:
library(ggplot2)
library(tidyverse)
library(gridExtra)
library(future)
library(furrr)
library(parallel)
n_cores <- detectCores() - 1
plan(multisession, workers = n_cores)
run_money_simulation <- function(n_sims, n_exchanges, player_a_start, player_b_start) {
set.seed(123)
sims_per_core <- ceiling(n_sims / n_cores)
simulate_exchange <- function(n_exchanges, player_a_start, player_b_start) {
person_a <- numeric(n_exchanges + 1)
person_b <- numeric(n_exchanges + 1)
person_a[1] <- player_a_start
person_b[1] <- player_b_start
for(i in 2:(n_exchanges + 1)) {
change <- sample(c(-1, 1), 1)
person_a[i] <- person_a[i-1] + change
person_b[i] <- person_b[i-1] - change
}
return(list(
final_diff = person_a[n_exchanges + 1] - person_b[n_exchanges + 1],
max_diff = max(abs(person_a - person_b)),
max_amount = max(c(max(person_a), max(person_b))),
min_amount = min(c(min(person_a), min(person_b)))
))
}
start_time <- Sys.time()
results <- future_map(1:n_sims, function(x) {
simulate_exchange(n_exchanges, player_a_start, player_b_start)
}, .options = furrr_options(seed = TRUE))
end_time <- Sys.time()
time_taken <- difftime(end_time, start_time, units = "secs")
final_diffs <- sapply(results, `[[`, "final_diff")
max_diffs <- sapply(results, `[[`, "max_diff")
max_amounts <- sapply(results, `[[`, "max_amount")
min_amounts <- sapply(results, `[[`, "min_amount")
plot_data <- tibble(
final_diff = final_diffs,
max_diff = max_diffs,
max_amount = max_amounts,
min_amount = min_amounts
) %>%
pivot_longer(everything(),
names_to = "metric",
values_to = "value")
main_plot <- ggplot(plot_data, aes(x = value)) +
geom_histogram(bins = 50, aes(fill = metric), color = "white", alpha = 0.7) +
facet_wrap(~metric, scales = "free", ncol = 2) +
scale_fill_manual(values = c("black", "red", "green4", "purple")) +
labs(title = paste("Money Exchange Simulation Results\n",
"Starting amounts: A =", player_a_start, ", B =", player_b_start),
subtitle = paste("Number of simulations:", n_sims,
"| Exchanges per simulation:", n_exchanges,
"\nProcessed using", n_cores, "CPU cores in",
round(time_taken, 2), "seconds"),
x = "Value",
y = "Count") +
theme_bw() +
theme(legend.position = "none")
print(main_plot)
invisible(list(
final_diffs = final_diffs,
max_diffs = max_diffs,
max_amounts = max_amounts,
min_amounts = min_amounts,
parameters = list(
n_sims = n_sims,
n_exchanges = n_exchanges,
player_a_start = player_a_start,
player_b_start = player_b_start,
n_cores = n_cores,
processing_time = time_taken
)
))
}
当我调用该函数时:
run_money_simulation(100000, 100, 100, 100)
我得到以下结果:
我只是想知道 - 我可以在 ggplot 中做些什么来检测比例分隔格式,以便删除这些图表中的空白?
尝试添加
xlim(c(<lower_limit>, <upper_limit>))
。从那里,您可以创建另一个函数来查找数据的第 90 个百分位数(或另一个阈值)所在的位置 - 将其输入到您的上限或下限中。问题在于您拥有离散数据。直方图是一种密度估计,专为具有密度的连续数据而设计。
查看左上图
final_diff
,显示的数据始终是 4 的倍数。由于您的箱宽略小于 4,因此有几个箱完全没有显示数据。您可以通过将箱宽设置为 4 来避免出现间隙,但这不是最佳解决方案。最佳解决方案是使用专为离散数据设计的显示器。这种离散数据通常以条形图的形式显示每个观测值的计数。您可以
ggplot2
使用geom_bar()
而不是 来获得该图geom_histogram
。例如,创建于 2024-11-21,使用reprex v2.1.1
这看起来比直方图难看,但更忠实于数据。在高分辨率显示器上,它不那么难看,你可以看到所有条形图和它们之间的空白都是相同的宽度。