当对 Cassandra 的写入使用 LOCAL_QUORUM 时,原始 DC 上的协调器节点是否是远程 DC 的唯一传播器,或者如果协调器或其连接发生问题,其他节点/DC 是否可以共享该角色?
尝试通过rpm包安装cassandra,发现baseurl有问题。有谁知道重定向有问题吗?
[vic@ol9-135 ~]$ sudo yum --disablerepo="*" --enablerepo="cassandra" list available
Apache Cassandra 7.5 kB/s | 11 kB 00:01
Error: Failed to download metadata for repo 'cassandra': repomd.xml parser error: Parse error at line: 1 (EntityRef: expecting ';'
)
或者使用jfog(不幸的是我不熟悉jfrog)。?
[vic@ol9-135 ~]$ wget https://redhat.cassandra.apache.org/41x/
--2024-03-24 13:36:44-- https://redhat.cassandra.apache.org/41x/
Resolving redhat.cassandra.apache.org (redhat.cassandra.apache.org)... 35.172.124.167
Connecting to redhat.cassandra.apache.org (redhat.cassandra.apache.org)|35.172.124.167|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://apache.jfrog.io/artifactory/cassandra-rpm/41x/ [following]
--2024-03-24 13:36:45-- https://apache.jfrog.io/artifactory/cassandra-rpm/41x/
Resolving apache.jfrog.io (apache.jfrog.io)... 18.232.172.199, 18.214.194.113, 3.95.117.170
Connecting to apache.jfrog.io (apache.jfrog.io)|18.232.172.199|:443... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: https://landing.jfrog.com/reactivate-server/apache [following]
--2024-03-24 13:36:46-- https://landing.jfrog.com/reactivate-server/apache
Resolving landing.jfrog.com (landing.jfrog.com)... 18.214.194.113, 3.95.117.170, 18.232.172.199
Connecting to landing.jfrog.com (landing.jfrog.com)|18.214.194.113|:443... connected.
HTTP request sent, awaiting response... 200 OK
我的 cassandra.repo
[vic@ol9-135 ~]$ cat /etc/yum.repos.d/cassandra.repo
[cassandra]
name=Apache Cassandra
baseurl=https://redhat.cassandra.apache.org/41x/
gpgcheck=0
repo_gpgcheck=0
gpgkey=https://downloads.apache.org/cassandra/KEYS
[vic@ol9-135 ~]$
我对使用专用的仅协调器 Cassandra 4.0 节点运行 Cassandra 4.1 存储节点有一些疑问。据我了解,此拓扑无法使用某些 4.1 特定功能。
- 我可以在协调器和存储节点之间看到不同的架构版本哈希,这符合预期。到目前为止,在我的测试中,我还没有看到任何模式更改问题,并且节点能够达成模式协议。据我了解,架构更改限制仅存在于主要版本之间。我在这里缺少什么吗?
- 我可以验证流操作在此拓扑中正常工作,例如集群的修复和扩展(添加更多节点)。还有其他我没有想到的潜在问题吗?
谢谢
信息 [HintsDispatcher:416] 2024-02-12 01:42:41,180 NoSpamLogger.java:91 - 达到最大内存使用量 (536870912),无法分配 1048576 块
#我们在日志中获取上述信息。此外,我们还面临延迟警报(条件:5 分钟内超过 2 秒)。延迟和日志中的此信息之间有任何关系吗?
无论如何,增加 file_cache_size_in_mb 可能会有所帮助,或者条件似乎太紧张,我们可以扩展它(例如 5 分钟跨度中的 5 秒)。到目前为止,还没有关于应用程序端延迟问题的投诉。
摘要:我有一个 2 节点集群,如果其中一个节点发生故障,我将无法登录。我收到错误。
Connection error: ('Unable to connect to any servers', {'192.168.1.104:9042': AuthenticationFailed('Failed to authenticate to 192.168.1.104:9042: Error from server: code=0100 [Bad credentials] message="Unable to perform authentication: Cannot achieve consistency level LOCAL_QUORUM"')})
- 我正在使用我创建的角色(不是 SUEPRUSER,因此 SUPERUSER 始终使用 QUORUM 的问题不应适用于此。根据我的阅读,它应该使用 LOCAL_ONE 但不是!
- 我特意将system_auth键空间复制设置为 2,因为建议这样做不会出现单点故障。
这是一个屏幕截图,显示我没有使用 cassandra SUPERUSER 但收到错误:
我希望即使一个节点发生故障,登录也能继续工作。
Apache Cassandra 稳定版和通用版之间有什么区别。这是否意味着通用版本可能比稳定版本有更多错误?或者是别的东西。请澄清这种隔离的标准是什么
我有一个由 21 个节点组成的 Cassandra 集群(每个节点有 4 TB Cassandra 数据量),我需要更换节点(从 Ubuntu 18.04 迁移到 22.04)
我想知道是否可以将数据量从现有实例重新附加到新实例而不产生负面后果?我正在考虑以下计划:
- 排空(nodetool排出)旧的Cassandra服务器并将其关闭;
- 分离数据卷并将其附加到新服务器;
- 启动新服务器;
- 对其余 Cassandra 节点重复以下步骤。
这样做有什么风险吗?唯一会改变的是节点的私有IP地址。
sstableloader 使用 v3.11.6 主机中的 sstablefiles 在 v4.1.3 主机上运行:
sstableloader -v --nodes oneofthehosts --username whatever --password whatever --truststore /etc/cassandra/conf/truststore.jks --truststore-password whatever --keystore /etc/cassandra/conf/keystore.jks --keystore-password whatever /tmp/keyspace/table
(稍微修剪过的)错误消息:
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: oneofthehosts/someip:9042 (com.datastax.driver.core.exceptions.OperationTimedOutException: [oneofthehosts/someip:9042] Operation timed out))
java.lang.RuntimeException: Unable to initialise org.apache.cassandra.utils.NativeSSTableLoaderClient
at org.apache.cassandra.utils.NativeSSTableLoaderClient.init(NativeSSTableLoaderClient.java:102)
at org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:167)
at org.apache.cassandra.tools.BulkLoader.load(BulkLoader.java:91)
at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:58)
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: oneofthehosts/someip:9042 (com.datastax.driver.core.exceptions.OperationTimedOutException: oneofthehosts/someip:9042] Operation timed out))
at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:270)
at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:109)
at com.datastax.driver.core.Cluster$Manager.negotiateProtocolVersionAndConnect(Cluster.java:1813)
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1726)
at com.datastax.driver.core.Cluster.init(Cluster.java:214)
at com.datastax.driver.core.Cluster.connectAsync(Cluster.java:387)
at com.datastax.driver.core.Cluster.connectAsync(Cluster.java:366)
at com.datastax.driver.core.Cluster.connect(Cluster.java:311)
at org.apache.cassandra.utils.NativeSSTableLoaderClient.init(NativeSSTableLoaderClient.java:70)
... 3 more
Exception in thread "main" org.apache.cassandra.tools.BulkLoadException: java.lang.RuntimeException: Unable to initialise org.apache.cassandra.utils.NativeSSTableLoaderClient
at org.apache.cassandra.tools.BulkLoader.load(BulkLoader.java:104)
at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:58)
在 logback-tools.xml 中设置 TRACE 级别后的一些选定消息:
DEBUG 13:29:19,765 [oneofthehosts/someip:9042] preparing to open 1 new connections, total = 1
DEBUG 13:29:19,915 Connection[oneofthehosts/someip:9042-1, inFlight=0, closed=false] Connection established, initializing transport
TRACE 13:29:19,920 Connection[oneofthehosts/someip:9042-1, inFlight=0, closed=false], stream 0, writing request STARTUP {CQL_VERSION=3.0.0, DRIVER_VERSION=3.11.0, DRIVER_NAME=DataStax Java Driver}
DEBUG 13:29:20,113 [id: 0xef0b330f, L:/someip:44896 - R:oneofthehosts/someip:9042] HANDSHAKEN: TLS_AES_256_GCM_SHA384
waiting for timeout
TRACE 13:29:32,027 Defuncting Connection[oneofthehosts/someip:9042-1, inFlight=0, closed=false]
com.datastax.driver.core.exceptions.OperationTimedOutException: [oneofthehosts/someip:9042] Operation timed out
DEBUG 13:29:32,028 [oneofthehosts/someip:9042] preventing new connections for the next 1000 ms
DEBUG 13:29:32,029 [oneofthehosts/someip:9042] Connection[oneofthehosts/someip:9042-1, inFlight=0, closed=false] failed, remaining = 0
DEBUG 13:29:32,029 Connection[oneofthehosts/someip:9042-1, inFlight=0, closed=true] closing connection
DEBUG 13:29:32,029 Not terminating Connection[oneofthehosts/someip:9042-1, inFlight=0, closed=true]: there are still pending requests
DEBUG 13:29:32,031 Connection[oneofthehosts/someip:9042-1, inFlight=0, closed=true], stream 0, Error writing request STARTUP {CQL_VERSION=3.0.0, DRIVER_VERSION=3.11.0, DRIVER_NAME=DataStax Java Driver}
DEBUG 13:29:32,035 [Control connection] error on oneofthehosts/someip:9042 connection, no more host to try
这看起来很熟悉吗?是否与 TLS 相关,或者我们是否超出了初始连接设置?
用户报告在spark-cassandra-connector中设置spark.cassandra.input.readsPerSec时,范围查询吞吐量远远高于预期。
工作依赖性。Java 驱动程序版本设置为 4.13.0。
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector_2.12</artifactId>
<version>3.2.0</version>
<exclusions>
<exclusion>
<groupId> com.datastax.oss</groupId>
<artifactId>java-driver-core-shaded</artifactId>
</exclusion>
</exclusions>
</dependency>
...
<dependency>
<groupId>com.datastax.oss</groupId>
<artifactId>java-driver-core</artifactId>
<version>4.13.0</version>
</dependency>
该作业有两个步骤(都是 FTS):
Dataset<Row> dataset = sparkSession.sqlContext().read()
.format("org.apache.spark.sql.cassandra")
.option("table", "inbox_user_msg_dummy")
.option("keyspace", "ssmp_inbox2").load();
-和-
Dataset<Row> olderDataset = sparkSession.sql("SELECT * FROM inbox_user_msg_dummy where app_uuid = 'cb663e07-7bcc-4039-ae97-8fb8e8a9ff77' AND " +
"create_hour < '" + minus180DaysInstant + "'");
作业配置:
SparkConf sparkConf = new SparkConf()
.setMaster("local[*]") //uncomment while running in local
.setAppName("inbox-gateway-spark-job")
.set("spark.scheduler.mode", "FAIR")
.set("spark.cassandra.connection.port", "9042")
.set("keyspace", "ssmp_inbox2")
.set("spark.cassandra.connection.host", "cass-556799284-1-1276056270.stg.ssmp-inbox2-stg.ms-df-cassandra.stg-az-southcentralus-6.prod.us.walmart.net,
cass-556799284-2-1276056276.stg.ssmp-inbox2-stg.ms-df-cassandra.stg-az-southcentralus-6.prod.us.walmart.net,
cass-556799284-3-1276056282.stg.ssmp-inbox2-stg.ms-df-cassandra.stg-az-southcentralus-6.prod.us.walmart.net")
.set("spark.cassandra.auth.username", "ssmp-inbox-app-v2")
.set("spark.cassandra.auth.password", "*")
.set("spark.cassandra.input.consistency.level", "LOCAL_ONE")
.set("spark.cassandra.concurrent.reads", "1")
.set("spark.cassandra.input.readsPerSec", "10")
.set("spark.cassandra.input.fetch.sizeInRows", "10")
.set("spark.cassandra.input.split.sizeInMB", "10")
.set("spark.cores.max", "20")
.set("spark.executor.memory", "20G")
.set("spark.yarn.executor.memoryOverhead", "12000")
.set("spark.cassandra.read.timeoutMS", "200000")
.set("spark.task.maxFailures", "10")
.set("spark.cassandra.connection.localDC", "southcentral");
请注意,Spark 将实际核心限制为 16 个,因为工作线程有 8 个核心。执行人1人。
当作业运行时,可以观察到第一个 FTS 每秒约有 22k 范围查询,集群上的 CPU 几乎饱和,而对于第二个 FTS,表上每秒约有 725 个范围查询。
预期总共有 16 个 Spark 核心,范围查询吞吐量将限制为 160/s(spark.cassandra.input.readsPerSec * Spark 核心)。
这个推理正确吗?对于控制 Spark-cassandra-connector 的读取吞吐量有什么建议?
我知道我们之前已经有其他用户成功配置了此限制,但我们从未仔细研究过最终的吞吐量是多少。不过,这似乎确实是一个很大的差异,因为这两个步骤本质上运行相同的操作 - 全表扫描。连接器最终运行的查询是相同的。
架构:
CREATE TABLE ssmp_inbox2.inbox_user_msg_dummy (
user_id text,
create_hour timestamp,
app_uuid text,
message_id text,
app_name text,
create_ts bigint,
is_actiontaken boolean,
is_compensable boolean,
is_deleted boolean,
is_read boolean,
message_payload text,
mini_app_name text,
notification text,
PRIMARY KEY ((user_id, create_hour, app_uuid), message_id)
) WITH CLUSTERING ORDER BY (message_id DESC)
AND additional_write_policy = '99p'
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND cdc = false
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND default_time_to_live = 0
AND extensions = {}
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair = 'BLOCKING'
AND speculative_retry = '99p';
查询:
SELECT * FROM ssmp_inbox2.inbox_user_msg_dummy WHERE token(user_id, create_hour, app_uuid) >= token(G9e7Y4Y, 2023-08-10T04:17:27.234Z, cb663e07-7bcc-4039-ae97-8fb8e8a9ff77) AND token(user_id, create_hour, app_uuid) <= 9121832956220923771 LIMIT 10
FWIW,平均分区大小为 649 字节,最大为 2.7kb。
我有一个包含 2 个字段的表:id (primary key) fld_1 text
,例如1 'hello world'
,我在目标表中有 1 行,并且我有包含 1 行的源 tsv 文件:1\t
。加载到目标表后,我希望看到1 null
,但数据没有改变,我仍然得到1 'hello world'
设置'dsbulk.schema.nullToUnset': 'false'
帮助我,但在我看来,这不是最好的解决方案,是否有任何正确的方法来加载具有空值的数据或提供的解决方案可以吗?