我们有一个 2 节点集群(我知道 3 是建议的最小值),它已经运行 Cassandra 3 和 4 多年,没有发生任何事故。
上周我们将两个节点都升级到了 Cassandra 5。
自那时起,节点之间似乎出现了流言蜚语和沟通问题。尤其是在更新架构时(例如创建或删除表时)。
这是在进行架构更改(删除表)之后立即出现的调试日志:
INFO [Native-Transport-Requests-1] 2024-11-03 09:48:04,021 Keyspace.java:379 - Creating replication strategy r_internal params KeyspaceParams{durable_writes=true, replication=ReplicationParams{class=org.apache.cassandra.locator.SimpleStrategy, replication_factor=2}}
DEBUG [Native-Transport-Requests-1] 2024-11-03 09:48:04,064 MigrationCoordinator.java:430 - Reported schema a47919aa-79b3-362c-b023-d8ac89671885 at endpoint /10.10.2.27:7000
DEBUG [MigrationStage:1] 2024-11-03 09:48:04,070 MigrationCoordinator.java:749 - Pushing schema mutations: [Mutation(keyspace='system_schema', key='72617976656e5f696e7465726e616c', modifications=[
[system_schema.columns] key=r_internal partition_deletion=deletedAt=-9223372036854775808, localDeletion=9223372036854775807 columns=[[] | []]
Row[info=[ts=-9223372036854775808] del=deletedAt=1730627238890000, localDeletion=1730627238 ]: table_name=p1489_log_node_input_output, column_name=cdate |
Row[info=[ts=-9223372036854775808] del=deletedAt=1730627238890000, localDeletion=1730627238 ]: table_name=p1489_log_node_input_output, column_name=device_id |
Row[info=[ts=-9223372036854775808] del=deletedAt=1730627238890000, localDeletion=1730627238 ]: table_name=p1489_log_node_input_output, column_name=event_timestamp |
Row[info=[ts=-9223372036854775808] del=deletedAt=1730627238890000, localDeletion=1730627238 ]: table_name=p1489_log_node_input_output, column_name=info |
Row[info=[ts=-9223372036854775808] del=deletedAt=1730627238890000, localDeletion=1730627238 ]: table_name=p1489_log_node_input_output, column_name=input |
Row[info=[ts=-9223372036854775808] del=deletedAt=1730627238890000, localDeletion=1730627238 ]: table_name=p1489_log_node_input_output, column_name=node_id |
Row[info=[ts=-9223372036854775808] del=deletedAt=1730627238890000, localDeletion=1730627238 ]: table_name=p1489_log_node_input_output, column_name=output |
Row[info=[ts=-9223372036854775808] del=deletedAt=1730627238890000, localDeletion=1730627238 ]: table_name=p1489_log_node_input_output, column_name=server |
Row[info=[ts=-9223372036854775808] del=deletedAt=1730627238890000, localDeletion=1730627238 ]: table_name=p1489_log_node_input_output, column_name=unique_flow_process |
[system_schema.indexes] key=r_internal partition_deletion=deletedAt=-9223372036854775808, localDeletion=9223372036854775807 columns=[[] | []]
Row[info=[ts=-9223372036854775808] del=deletedAt=1730627238890000, localDeletion=1730627238 ]: table_name=p1489_log_node_input_output, index_name=p1489_log_node_input_output_unique_flow_process_idx |
[system_schema.keyspaces] key=r_internal partition_deletion=deletedAt=-9223372036854775808, localDeletion=9223372036854775807 columns=[[] | [durable_writes replication]]
Row[info=[ts=1730627238890000] ]: EMPTY | [durable_writes=true ts=1730627238890000], [replication=0000000200000005636c6173730000002b6f72672e6170616368652e63617373616e6472612e6c6f6361746f722e53696d706c655374726174656779000000127265706c69636174696f6e5f666163746f720000000132 ts=1730627238890000]
[system_schema.tables] key=r_internal partition_deletion=deletedAt=-9223372036854775808, localDeletion=9223372036854775807 columns=[[] | []]
Row[info=[ts=-9223372036854775808] del=deletedAt=1730627238890000, localDeletion=1730627238 ]: table_name=p1489_log_node_input_output |
])]
DEBUG [MigrationStage:1] 2024-11-03 09:48:04,071 MigrationCoordinator.java:757 - Pushing schema mutations to /10.10.2.28:7000: [Mutation(keyspace='system_schema', key='72617976656e5f696e7465726e616c', modifications=[
[system_schema.columns] key=r_internal partition_deletion=deletedAt=-9223372036854775808, localDeletion=9223372036854775807 columns=[[] | []]
Row[info=[ts=-9223372036854775808] del=deletedAt=1730627238890000, localDeletion=1730627238 ]: table_name=p1489_log_node_input_output, column_name=cdate |
Row[info=[ts=-9223372036854775808] del=deletedAt=1730627238890000, localDeletion=1730627238 ]: table_name=p1489_log_node_input_output, column_name=device_id |
Row[info=[ts=-9223372036854775808] del=deletedAt=1730627238890000, localDeletion=1730627238 ]: table_name=p1489_log_node_input_output, column_name=event_timestamp |
Row[info=[ts=-9223372036854775808] del=deletedAt=1730627238890000, localDeletion=1730627238 ]: table_name=p1489_log_node_input_output, column_name=info |
Row[info=[ts=-9223372036854775808] del=deletedAt=1730627238890000, localDeletion=1730627238 ]: table_name=p1489_log_node_input_output, column_name=input |
Row[info=[ts=-9223372036854775808] del=deletedAt=1730627238890000, localDeletion=1730627238 ]: table_name=p1489_log_node_input_output, column_name=node_id |
Row[info=[ts=-9223372036854775808] del=deletedAt=1730627238890000, localDeletion=1730627238 ]: table_name=p1489_log_node_input_output, column_name=output |
Row[info=[ts=-9223372036854775808] del=deletedAt=1730627238890000, localDeletion=1730627238 ]: table_name=p1489_log_node_input_output, column_name=server |
Row[info=[ts=-9223372036854775808] del=deletedAt=1730627238890000, localDeletion=1730627238 ]: table_name=p1489_log_node_input_output, column_name=unique_flow_process |
[system_schema.indexes] key=r_internal partition_deletion=deletedAt=-9223372036854775808, localDeletion=9223372036854775807 columns=[[] | []]
Row[info=[ts=-9223372036854775808] del=deletedAt=1730627238890000, localDeletion=1730627238 ]: table_name=p1489_log_node_input_output, index_name=p1489_log_node_input_output_unique_flow_process_idx |
[system_schema.keyspaces] key=r_internal partition_deletion=deletedAt=-9223372036854775808, localDeletion=9223372036854775807 columns=[[] | [durable_writes replication]]
Row[info=[ts=1730627238890000] ]: EMPTY | [durable_writes=true ts=1730627238890000], [replication=0000000200000005636c6173730000002b6f72672e6170616368652e63617373616e6472612e6c6f6361746f722e53696d706c655374726174656779000000127265706c69636174696f6e5f666163746f720000000132 ts=1730627238890000]
[system_schema.tables] key=r_internal partition_deletion=deletedAt=-9223372036854775808, localDeletion=9223372036854775807 columns=[[] | []]
Row[info=[ts=-9223372036854775808] del=deletedAt=1730627238890000, localDeletion=1730627238 ]: table_name=p1489_log_node_input_output |
])]
ERROR [GossipStage:1] 2024-11-03 09:48:04,187 JVMStabilityInspector.java:70 - Exception in thread Thread[GossipStage:1,5,GossipStage]
java.lang.RuntimeException: java.lang.AssertionError
at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:108)
at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:45)
at org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:430)
at org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.AssertionError: null
at org.apache.cassandra.db.TypeSizes.sizeof(TypeSizes.java:44)
at org.apache.cassandra.gms.VersionedValue$VersionedValueSerializer.serializedSize(VersionedValue.java:365)
at org.apache.cassandra.gms.VersionedValue$VersionedValueSerializer.serializedSize(VersionedValue.java:343)
at org.apache.cassandra.gms.EndpointStateSerializer.serializedSize(EndpointState.java:344)
at org.apache.cassandra.gms.EndpointStateSerializer.serializedSize(EndpointState.java:300)
at org.apache.cassandra.gms.GossipDigestAckSerializer.serializedSize(GossipDigestAck.java:96)
at org.apache.cassandra.gms.GossipDigestAckSerializer.serializedSize(GossipDigestAck.java:61)
at org.apache.cassandra.net.Message$Serializer.payloadSize(Message.java:1088)
at org.apache.cassandra.net.Message.payloadSize(Message.java:1131)
at org.apache.cassandra.net.Message$Serializer.serializedSize(Message.java:769)
at org.apache.cassandra.net.Message.serializedSize(Message.java:1111)
at org.apache.cassandra.net.OutboundConnections.connectionTypeFor(OutboundConnections.java:215)
at org.apache.cassandra.net.OutboundConnections.connectionFor(OutboundConnections.java:207)
at org.apache.cassandra.net.OutboundConnections.enqueue(OutboundConnections.java:96)
at org.apache.cassandra.net.MessagingService.doSend(MessagingService.java:473)
at org.apache.cassandra.net.OutboundSink.accept(OutboundSink.java:70)
at org.apache.cassandra.net.MessagingService.send(MessagingService.java:462)
at org.apache.cassandra.net.MessagingService.send(MessagingService.java:437)
at org.apache.cassandra.gms.GossipDigestSynVerbHandler.doVerb(GossipDigestSynVerbHandler.java:110)
at org.apache.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:78)
at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:97)
... 7 common frames omitted
DEBUG [GossipStage:1] 2024-11-03 09:48:04,187 HeapUtils.java:133 - Heap dump creation on uncaught exceptions is disabled.
DEBUG [GossipStage:1] 2024-11-03 09:48:04,187 HeapUtils.java:133 - Heap dump creation on uncaught exceptions is disabled.
线程 Thread[GossipStage:1,5,GossipStage] 中的异常Exception不断重复出现。
此时,运行nodetool status显示,每个节点都认为另一个节点已关闭
我通常需要耗尽并重新启动两个节点(我一个接一个地执行此操作),以便它们互相看到并显示它们的模式是同步的。
以下是nodetool gossipinfo的输出(重新启动节点后)
/10.10.2.27
generation:1730627829
heartbeat:939
STATUS:23:NORMAL,-1656170799351181049
LOAD:898:2.029075780322E12
SCHEMA:17:a47919aa-79b3-362c-b023-d8ac89671885
DC:13:dc1
RACK:15:rack1
RELEASE_VERSION:6:5.0.2
INTERNAL_IP:11:10.10.2.27
RPC_ADDRESS:5:10.10.2.27
NET_VERSION:2:12
HOST_ID:3:8be5925c-0436-4bc7-bd4c-0fb5dd8fc9e5
RPC_READY:35:true
INTERNAL_ADDRESS_AND_PORT:9:10.10.2.27:7000
NATIVE_ADDRESS_AND_PORT:4:10.10.2.27:9042
STATUS_WITH_PORT:22:NORMAL,-1656170799351181049
SSTABLE_VERSIONS:7:big-nb
TOKENS:21:<hidden>
/10.10.2.28
generation:1730628138
heartbeat:619
LOAD:586:2.215579324507E12
SCHEMA:17:a47919aa-79b3-362c-b023-d8ac89671885
DC:13:dc1
RACK:15:rack1
RELEASE_VERSION:6:5.0.2
NET_VERSION:2:12
HOST_ID:3:c9e8a4f8-c7b2-4cac-994a-e8de50c9f358
RPC_READY:35:true
INTERNAL_ADDRESS_AND_PORT:9:10.10.2.28:7000
NATIVE_ADDRESS_AND_PORT:4:10.10.2.28:9042
STATUS_WITH_PORT:22:NORMAL,-2024247660995288657
SSTABLE_VERSIONS:7:big-nb
TOKENS:21:<hidden>
这里从第二个节点开始
/10.10.2.27
generation:1730627829
heartbeat:906
LOAD:898:2.029075780322E12
SCHEMA:17:a47919aa-79b3-362c-b023-d8ac89671885
DC:13:dc1
RACK:15:rack1
RELEASE_VERSION:6:5.0.2
NET_VERSION:2:12
HOST_ID:3:8be5925c-0436-4bc7-bd4c-0fb5dd8fc9e5
RPC_READY:35:true
INTERNAL_ADDRESS_AND_PORT:9:10.10.2.27:7000
NATIVE_ADDRESS_AND_PORT:4:10.10.2.27:9042
STATUS_WITH_PORT:22:NORMAL,-1656170799351181049
SSTABLE_VERSIONS:7:big-nb
TOKENS:21:<hidden>
/10.10.2.28
generation:1730628138
heartbeat:584
STATUS:23:NORMAL,-2024247660995288657
LOAD:524:2.215537773956E12
SCHEMA:17:a47919aa-79b3-362c-b023-d8ac89671885
DC:13:dc1
RACK:15:rack1
RELEASE_VERSION:6:5.0.2
INTERNAL_IP:11:10.10.2.28
RPC_ADDRESS:5:10.10.2.28
NET_VERSION:2:12
HOST_ID:3:c9e8a4f8-c7b2-4cac-994a-e8de50c9f358
RPC_READY:35:true
INTERNAL_ADDRESS_AND_PORT:9:10.10.2.28:7000
NATIVE_ADDRESS_AND_PORT:4:10.10.2.28:9042
STATUS_WITH_PORT:22:NORMAL,-2024247660995288657
SSTABLE_VERSIONS:7:big-nb
TOKENS:21:<hidden>
当他们遇到问题时,这是在 10.10.2.28 上运行时nodetool gossinfo显示的内容:
/10.10.2.27
generation:1730626480
heartbeat:810
LOAD:771:2.029304184276E12
SCHEMA:17:9bcfb867-92ca-3144-92d7-da5fed3359d0
DC:13:dc1
RACK:15:rack1
RELEASE_VERSION:6:5.0.2
NET_VERSION:2:12
HOST_ID:3:8be5925c-0436-4bc7-bd4c-0fb5dd8fc9e5
RPC_READY:35:true
INTERNAL_ADDRESS_AND_PORT:9:10.10.2.27:7000
NATIVE_ADDRESS_AND_PORT:4:10.10.2.27:9042
STATUS_WITH_PORT:22:NORMAL,-1656170799351181049
SSTABLE_VERSIONS:7:big-nb
TOKENS:21:<hidden>
/10.10.2.28
generation:1730625184
heartbeat:2523
STATUS:23:NORMAL,-2024247660995288657
LOAD:2513:2.214202366182E12
SCHEMA:2243:a47919aa-79b3-362c-b023-d8ac89671885
DC:13:dc1
RACK:15:rack1
RELEASE_VERSION:6:5.0.2
INTERNAL_IP:11:10.10.2.28
RPC_ADDRESS:5:10.10.2.28
NET_VERSION:2:12
HOST_ID:3:c9e8a4f8-c7b2-4cac-994a-e8de50c9f358
RPC_READY:35:true
INTERNAL_ADDRESS_AND_PORT:9:10.10.2.28:7000
NATIVE_ADDRESS_AND_PORT:4:10.10.2.28:9042
STATUS_WITH_PORT:22:NORMAL,-2024247660995288657
SSTABLE_VERSIONS:7:big-nb
INDEX_STATUS:2198:{"r_internal.p1643_log_node_input_output_unique_flow_process_idx":"BUILD_SUCCEEDED","r_internal.p1720_log_node_input_output_unique_flow_process_idx":"BUILD_SUCCEEDED","r_internal.p1479_log_node_input_output_unique_flow_process_idx":"BUILD_SUCCEEDED"...}
TOKENS:21:<hidden>
在此提醒后人,这最终被认定为 Apache Cassandra 5.0 ( CASSANDRA-17240、CASSANDRA-18058 ) 中的 Trie memtable 和存储附加索引 (SAI) 功能引入的一个错误。
作为实现的一部分,它引入了一种新行为,即索引状态通过八卦传播。不幸的是,消息中的状态因重复的键空间和完整的状态名称而变得臃肿,导致消息大小非常大。
AssertionError
抛出此错误是因为消息大于最大值 65K:单元测试未检测到此处报告的问题,因为这是一个极端情况,集群中有 800 个表,每个表都定义了一个辅助索引,导致架构过大。一般指导原则是所有键空间总共有 200 个表。增加到 300 个可能仍然可行,而 400 个则进入不可预测的领域。
C* 5.0.3 ( CASSANDRA-20058 )中简化了八卦消息索引状态的序列化格式,因此此问题的风险已得到缓解。但是,如果集群继续拥有过多的表,则该补丁并不能完全消除风险。干杯!