我在进行网络分析时遇到了麻烦。我有一个数据集,其中包含数百个人在不同位置的数千个检测结果。我试图获取每个个人的关键网络统计数据,包括每个个人的节点数和边数以及每个个人的网络直径(定义为该个人访问的任何两个节点之间的最大距离)。
我尝试了 igraph,但是我有限的 R 技能不允许我将找到的在线示例转换为适合我的数据。
下面是我的数据的一个简化示例(距离以公里为单位):
df <- data.frame(id = c("3811","3811","3832","3832","3832","3832"),
Program = c("P1","P1","P1","P1","P1","P1"),
from = c("hill","town","hill","wood","wood","lake"),
from_lon = c(130.2,130.5,130.2,131.3,131.3,129.6),
from_lat = c(-30.2,-30.5,-30.2,-31.3,-31.3,-29.6),
to = c("town","lake","wood","wood","lake","town"),
to_lon = c(130.5,129.6,131.3,131.3,129.6,130.5),
to_lat = c(-30.5,-29.6,-31.3,-31.3,-29.6,-30.5),
dist = c(44.111,132.506,161.456,0,249.847,132.506))
这将给出以下数据框:
id Program from from_lon from_lat to to_lon to_lat dist
3811 P1 hill 130.2 -30.2 town 130.5 -30.5 44.111
3811 P1 town 130.5 -30.5 lake 129.6 -29.6 132.506
3832 P1 hill 130.2 -30.2 wood 131.3 -31.3 161.456
3832 P1 wood 131.3 -31.3 wood 131.3 -31.3 0.000
3832 P1 wood 131.3 -31.3 lake 129.6 -29.6 249.847
3832 P1 lake 129.6 -29.6 town 130.5 -30.5 132.506
由于我的 igraph 失败,我提出了这个过于复杂的代码(我认为它的作用相同):
indiv_nodes <- df %>%
filter(id == "3811"& dist > 0) %>% #exclude repeat detections originating at same site
summarise(
id = dplyr::first(id),
prog = first(Program),
nodes = n_distinct(to)+1, #+1 to include start location
netdiam = max(dist))
indiv_edges <- df %>%
filter(id == "3811" & dist > 0) %>% #Include only edges between nodes, exclude repeat detections at same site
group_by(from, to) %>%
summarise(
from = dplyr::first(from),
to = dplyr::first(to),
weight = n())
net <- transform(indiv_nodes, edges = sum(indiv_edges$weight))
#-
indiv_nodes_n <- df %>%
filter(id == "3832" & dist > 0) %>%
summarise(
id = dplyr::first(id),
prog = first(Program),
nodes = n_distinct(to)+1,
netdiam = max(dist))
indiv_edges_n <- df %>%
filter(id == "3832" & dist > 0) %>%
group_by(from, to) %>%
summarise(
from = dplyr::first(from),
to = dplyr::first(to),
weight = n())
indiv_net <- transform(indiv_nodes_n, edges = sum(indiv_edges_n$weight))
net <- rbind(net, indiv_net)
#-
net
结果是这样的:
id prog nodes netdiam edges
3811 P1 3 132.506 2
3832 P1 4 249.847 3
我的问题是我必须对数据集中的数百个个体(而不仅仅是两个)重复此操作,然后将它们重新绑定在一起。
我尝试创建一个循环函数但失败了。
如果有人能用 igraph 解决方案或循环函数帮助我上述代码运行数据集中的所有 id,那就太棒了!