我正在 Docker 容器中试验 netfilter。我有三个容器,一个是“路由器”,两个是“端点”。它们各自通过 进行连接pipework
,因此每个端点 <-> 路由器连接都存在一个外部(主机)桥。像这样的东西:
containerA (eth1) -- hostbridgeA -- (eth1) containerR
containerB (eth1) -- hostbridgeB -- (eth2) containerR
然后在“路由器”容器中,我有一个像这样配置的containerR
桥:br0
bridge name bridge id STP enabled interfaces
br0 8000.3a047f7a7006 no eth1
eth2
我net.bridge.bridge-nf-call-iptables=0
在主机上有,因为这干扰了我的一些其他测试。
containerA
有IP192.168.10.1/24
并且containerB
有192.168.10.2/24
.
然后我有一个非常简单的规则集来跟踪转发的数据包:
flush ruleset
table bridge filter {
chain forward {
type filter hook forward priority 0; policy accept;
meta nftrace set 1
}
}
这样,我发现只跟踪 ARP 数据包,而不跟踪 ICMP 数据包。换句话说,如果我运行nft monitor
while containerA
is pinging containerB
,我可以看到跟踪的 ARP 数据包,但看不到 ICMP 数据包。这让我感到惊讶,因为根据我对nftables 桥接过滤器链类型的理解,数据包不会通过该forward
阶段的唯一时间是如果它通过input
主机发送(在本例中containerR
)。根据 Linux 数据包流程图:
我仍然希望 ICMP 数据包采用转发路径,就像 ARP 一样。如果我跟踪路由前和路由后,我确实会看到数据包。所以我的问题是,这里发生了什么?是否存在我不知道的 Flowtable 或其他短路?它是特定于容器网络和/或 Docker 的吗?我可以检查虚拟机而不是容器,但我很感兴趣其他人是否意识到或遇到过这种情况。
编辑:此后我在 VirtualBox 中使用一组 Alpine 虚拟机创建了类似的设置。ICMP 数据包确实到达了forward
链,因此主机或 Docker 中的某些内容似乎干扰了我的期望。在我或其他人能够确定原因之前,我不会回答这个问题,以防其他人知道它有用。
谢谢!
最小可重复示例
为此,我在虚拟机中使用 Alpine Linux 3.19.1,并community
在以下位置启用存储库/etc/apk/respositories
:
# Prerequisites of host
apk add bridge bridge-utils iproute2 docker openrc
service docker start
# When using linux bridges instead of openvswitch, disable iptables on bridges
sysctl net.bridge.bridge-nf-call-iptables=0
# Pipework to let me avoid docker's IPAM
git clone https://github.com/jpetazzo/pipework.git
cp pipework/pipework /usr/local/bin/
# Create two containers each on their own network (bridge)
pipework brA $(docker create -itd --name hostA alpine:3.19) 192.168.10.1/24
pipework brB $(docker create -itd --name hostB alpine:3.19) 192.168.10.2/24
# Create bridge-filtering container then connect it to both of the other networks
R=$(docker create --cap-add NET_ADMIN -itd --name hostR alpine:3.19)
pipework brA -i eth1 $R 0/0
pipework brB -i eth2 $R 0/0
# Note: `hostR` doesn't have/need an IP address on the bridge for this example
# Add bridge tools and netfilter to the bridging container
docker exec hostR apk add bridge bridge-utils nftables
docker exec hostR brctl addbr br
docker exec hostR brctl addif br eth1 eth2
docker exec hostR ip link set dev br up
# hostA should be able to ping hostB
docker exec hostA ping -c 1 192.168.10.2
# 64 bytes from 192.168.10.2...
# Set nftables rules
docker exec hostR nft add table bridge filter
docker exec hostR nft add chain bridge filter forward '{type filter hook forward priority 0;}'
docker exec hostR nft add rule bridge filter forward meta nftrace set 1
# Now ping hostB from hostA while nft monitor is running...
docker exec hostA ping -c 4 192.168.10.2 & docker exec hostR nft monitor
# Ping will succeed, nft monitor will not show any echo-request/-response packets traced, only arps
# Example:
trace id abc bridge filter forward packet: iif "eth2" oif "eth1" ether saddr ... daddr ... arp operation request
trace id abc bridge filter forward rule meta nfrtrace set 1 (verdict continue)
trace id abc bridge filter forward verdict continue
trace id abc bridge filter forward policy accept
...
trace id def bridge filter forward packet: iif "eth1" oif "eth2" ether saddr ... daddr ... arp operation reply
trace id def bridge filter forward rule meta nfrtrace set 1 (verdict continue)
trace id def bridge filter forward verdict continue
trace id def bridge filter forward policy accept
# Add tracing in prerouting and the icmp packets are visible:
docker exec hostR nft add chain bridge filter prerouting '{type filter hook prerouting priority 0;}'
docker exec hostR nft add rule bridge filter prerouting meta nftrace set 1
# Run again
docker exec hostA ping -c 4 192.168.10.2 & docker exec hostR nft monitor
# Ping still works (obviously), but we can see its packets in prerouting, which then disappear from the forward chain, but ARP shows up in both.
# Example:
trace id abc bridge filter prerouting packet: iif "eth1" ether saddr ... daddr ... ... icmp type echo-request ...
trace id abc bridge filter prerouting rule meta nfrtrace set 1 (verdict continue)
trace id abc bridge filter prerouting verdict continue
trace id abc bridge filter prerouting policy accept
...
trace id def bridge filter prerouting packet: iif "eth2" ether saddr ... daddr ... ... icmp type echo-reply ...
trace id def bridge filter prerouting rule meta nfrtrace set 1 (verdict continue)
trace id def bridge filter prerouting verdict continue
trace id def bridge filter prerouting policy accept
...
trace id 123 bridge filter prerouting packet: iif "eth1" ether saddr ... daddr ... ... arp operation request
trace id 123 bridge filter prerouting rule meta nfrtrace set 1 (verdict continue)
trace id 123 bridge filter prerouting verdict continue
trace id 123 bridge filter prerouting policy accept
trace id 123 bridge filter forward packet: iif "eth1" oif "eth2" ether saddr ... daddr ... arp operation request
trace id 123 bridge filter forward rule meta nfrtrace set 1 (verdict continue)
trace id 123 bridge filter forward verdict continue
trace id 123 bridge filter forward policy accept
...
trace id 456 bridge filter prerouting packet: iif "eth2" ether saddr ... daddr ... ... arp operation reply
trace id 456 bridge filter prerouting rule meta nfrtrace set 1 (verdict continue)
trace id 456 bridge filter prerouting verdict continue
trace id 456 bridge filter prerouting policy accept
trace id 456 bridge filter forward packet: iif "eth2" oif "eth1" ether saddr ... daddr ... arp operation reply
trace id 456 bridge filter forward rule meta nfrtrace set 1 (verdict continue)
trace id 456 bridge filter forward verdict continue
trace id 456 bridge filter forward policy accept
# Note the trace id matching across prerouting and forward chains
我也使用 openvswitch 进行了尝试,但为了简单起见,我使用了 Linux 桥接示例,无论如何它都会产生相同的结果。与 openvswitch 的唯一真正区别是net.bridge.bridge-nf-call-iptables=0
不需要,IIRC。