概括
我创建了一个有 3 个节点的场景,其中 1 个节点与其他节点不同步,当我连接到该节点时,我发现它正在检索不应该检索的数据,因为它正在从另一个节点读取数据(我相信我使用驱动程序连接到的节点是协调器节点)。
我不知道我错过了什么以及为什么Cassandra
要这样做?
步骤:
- 使用 docker 和 cassandra 版本 4.1.5 创建一个具有默认设置且无身份验证的 3 个节点的集群,并使用
CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch
docker run --name cass1 -d -p 19001:9042 --network=casscluster -e CASSANDRA_CLUSTER_NAME=chat -e CASSANDRA_DC=dc1 -e CASSANDRA_RACK=rack1 -e CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch -e CASSANDRA_SEEDS="cass1,cass2" -e CASSANDRA_LISTEN_ADDRESS="cass1" -e CASSANDRA_BROADCAST_ADDRESS="cass1" cassandra:4.1.5
docker run --name cass2 -d -p 19002:9042 --network=casscluster -e CASSANDRA_CLUSTER_NAME=chat -e CASSANDRA_DC=dc1 -e CASSANDRA_RACK=rack1 -e CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch -e CASSANDRA_SEEDS="cass1,cass2" -e CASSANDRA_LISTEN_ADDRESS="cass2" -e CASSANDRA_BROADCAST_ADDRESS="cass2" cassandra:4.1.5
docker run --name cass3 -d -p 19003:9042 --network=casscluster -e CASSANDRA_CLUSTER_NAME=chat -e CASSANDRA_DC=dc1 -e CASSANDRA_RACK=rack1 -e CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch -e CASSANDRA_SEEDS="cass1,cass2" -e CASSANDRA_LISTEN_ADDRESS="cass3" -e CASSANDRA_BROADCAST_ADDRESS="cass3" cassandra:4.1.5
- 在每个节点中,禁用来自 yml 的提示切换并重新启动节点(检查每个节点以
nodetool statushandoff
确保hinted_hand_off
已禁用)
nodetool statushandoff
Hinted handoff is not running
- 创建复制因子为 3 的键空间。
create keyspace testak WITH REPLICATION = {
'class' : 'NetworkTopologyStrategy',
'dc1' : 3
};
- 创建表并禁用
speculative_retry
(这样如果查询花费的时间太长它就不会从其他节点读取)
create table testak.mooz
(
userid int,
chatid int,
name text,
primary key (userid, chatid)
);
alter table testak.mooz with speculative_retry = 'none';
- 停止节点 3
- 插入节点 1
insert into testak.mooz (userid, chatid) values (1, 1);
insert into testak.mooz (userid, chatid) values (2, 1);
insert into testak.mooz (userid, chatid) values (3, 1);
insert into testak.mooz (userid, chatid) values (4, 1);
insert into testak.mooz (userid, chatid) values (5, 1);
insert into testak.mooz (userid, chatid) values (6, 1);
insert into testak.mooz (userid, chatid) values (7, 1);
insert into testak.mooz (userid, chatid) values (8, 1);
insert into testak.mooz (userid, chatid) values (9, 1);
insert into testak.mooz (userid, chatid) values (10, 1);
- 启动节点3
- 使用外部 cqlsh 连接到节点 3
docker run --name cqlsh --rm -it --network=casscluster nuvo/docker-cqlsh cqlsh cass3 9042 --cqlversion=3.4.6
- 设置一致性一
CONSISTENCY ONE
- 追踪
TRACING ON
- 选择 1 行
select * from testak.mooz where userid = 1 and chatid = 1;
因为复制因子是 3 并且我有 3 个节点,所以我期望所有节点都认为它们拥有所有数据并且不需要从另一个节点查询数据,但是通过多次发出查询,我发现有时请求会转到其他节点并实际检索到不应该检索的数据,因为节点 3 存在不一致并且没有真实的记录数据。
注意:大多数时候(~90%)它不会从另一个节点读取数据。
更新
后来我使用设置了日志记录级别nodetool setlogginglevel org.apache.cassandra ALL
并查看了debug.log
,这是当 cassandra 决定从另一个节点读取数据时的日志:
TRACE [Native-Transport-Requests-1] 2024-07-18 08:47:01,109 CoordinatorWarnings.java:49 - CoordinatorTrackWarnings.init()
TRACE [Native-Transport-Requests-1] 2024-07-18 08:47:01,110 Dispatcher.java:164 - Received: QUERY select * from testak.mooz where userid = 1 and chatid = 1; [pageSize = 100] at consistency ONE, v=4/v4
TRACE [Native-Transport-Requests-1] 2024-07-18 08:47:01,110 QueryProcessor.java:251 - Process SelectStatement[aggregationSpecFactory=<null>,bindVariables=[],isReversed=false,limit=<null>,..(truncated long log line)
TRACE [Native-Transport-Requests-1] 2024-07-18 08:47:01,111 ReadCallback.java:90 - Blockfor is 1; setting up requests to org.apache.cassandra.locator.ReplicaPlan$SharedForTokenRead@74da23a8
TRACE [Native-Transport-Requests-1] 2024-07-18 08:47:01,111 MessagingService.java:401 - cass3/172.20.0.4:7000 sending READ_REQ to 2781@/172.20.0.3:7000
TRACE [Native-Transport-Requests-1] 2024-07-18 08:47:01,111 AbstractReadExecutor.java:226 - Decided not to speculate as 9223372036854775807 > 5000000
TRACE [Native-Transport-Requests-1] 2024-07-18 08:47:01,122 CoordinatorWarnings.java:80 - CoordinatorTrackWarnings.done() with state {}
TRACE [Native-Transport-Requests-1] 2024-07-18 08:47:01,122 CoordinatorWarnings.java:61 - CoordinatorTrackWarnings.reset()
TRACE [Native-Transport-Requests-1] 2024-07-18 08:47:01,122 Dispatcher.java:214 - Responding: ROWS [userid(testak, mooz), org.apache.cassandra.db.marshal.Int32Type][chatid(testak, mooz), org.apache.cassandra.db.marshal.Int32Type][name(testak, mooz), org.apache.cassandra.db.marshal.UTF8Type]
| 1 | 1 | null
---, v=4/v4
但是,当从节点本身读取数据时,sending READ_REQ
我看到的不是带有 的行,而是这一行:
TRACE [Native-Transport-Requests-1] 2024-07-18 08:47:49,248 EndpointMessagingVersions.java:67 - Assuming current protocol version for cass3/172.20.0.4:7000
TRACE [Native-Transport-Requests-1] 2024-07-18 08:47:49,248 AbstractReadExecutor.java:158 - reading data locally
更新2
在cass3上运行nodetool getendpoints testak mooz 1
:
172.20.0.3
172.20.0.4
172.20.0.2
nodetool status testak
:
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.20.0.3 227.48 KiB 16 100.0% 2a4a84a4-1894-488a-8661-680cf818384b rack1
UN 172.20.0.4 232.56 KiB 16 100.0% 9ed9d3f6-13e0-424f-bf52-1bd359d8a26e rack1
UN 172.20.0.2 281.54 KiB 16 100.0% 68de8980-ab63-4745-85b1-6b98636ea9af rack1
nodetool describecluster
:
Cluster Information:
Name: chat
Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
DynamicEndPointSnitch: enabled
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
629e0d12-c942-3eeb-bce3-5616a291ae90: [172.20.0.4, 172.20.0.2, 172.20.0.3]
Stats for all nodes:
Live: 3
Joining: 0
Moving: 0
Leaving: 0
Unreachable: 0
Data Centers:
dc1 #Nodes: 3 #Down: 0
Database versions:
4.1.5: [172.20.0.4:7000, 172.20.0.2:7000, 172.20.0.3:7000]
Keyspaces:
system_auth -> Replication class: SimpleStrategy {replication_factor=1}
testak -> Replication class: NetworkTopologyStrategy {dc1=3}
system_distributed -> Replication class: SimpleStrategy {replication_factor=3}
system_traces -> Replication class: SimpleStrategy {replication_factor=2}
system_schema -> Replication class: LocalStrategy {}
system -> Replication class: LocalStrategy {}
不幸的是,你的基本假设是错误的。
虽然 cqlsh 强制将协调器设置为其连接的节点,但它并不保证从协调器读取数据。协调器仍然可以决定从其他节点获取数据。由告密者决定协调器将此请求发送到哪个节点。
您正在使用动态告密者,它根据读取延迟分配请求并将请求路由到最健康的节点。
Cassandra 这样做是有充分理由的。如果协调器感觉不好,它就不应该接受请求,即使从技术上讲网络行程更少。动态告密者会将请求发送到延迟最快的节点,从而将负载均匀地分散到整个集群中。