关于【ceph】的问题- 第1页

Aref

Asked: 2025-02-04 03:00:08 +0800 CST

升级至 Ceph Quincy 17.2.8 后 OSDS 稳定性问题

6

我们将 Ceph 集群从 Pacific 16.2.3 版本升级到 Quincy 17.2.8 后遇到一个稳定性问题，我想寻求帮助。

升级后，我们发现我们的多个对象存储守护进程 (OSD) 出现了不稳定的行为。这些 OSD 经常出现“抖动”情况，即意外停机然后恢复。此问题主要影响集群中最近升级的 OSD。

查看受影响 OSD 的日志后，我们发现以下消息：

2025-02-03T08:34:09.769+0000 7f0f11390780 -1 bluestore::NCB::__restore_allocator::Failed open_for_read with error-code -2
2025-02-03T08:38:22.920+0000 7feb9dd44780 -1 bluestore::NCB::__restore_allocator::No Valid allocation info on disk (empty file)

为了解决这个问题，我们执行了 ceph-bluestore-tool fsck 和 repair 命令。虽然这些命令执行成功，但它们并没有解决当前的问题。

此外，我们从 ceph 日志中捕获了以下崩溃信息：

ceph crash info 2025-02-03T09:19:08.749233Z_9e2800fb-77f6-46cb-8087-203ea15a2039
{
   "assert_condition": "log.t.seq == log.seq_live",
   "assert_file": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/17.2.8/rpm/el9/BUILD/c
eph-17.2.8/src/os/bluestore/BlueFS.cc",
   "assert_func": "uint64_t BlueFS::_log_advance_seq()",
   "assert_line": 3029,
   "assert_msg": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/17.2.8/rpm/el9/BUILD/ce
ph-17.2.8/src/os/bluestore/BlueFS.cc: In function 'uint64_t BlueFS::_log_advance_seq()' thread 7ff983564640 time 2025-02-03T09:19:08.738781+0000\n/home/jenkins-build/build/workspace/ceph-bu
ild/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/17.2.8/rpm/el9/BUILD/ceph-17.2.8/src/os/bluestore/BlueFS.cc: 3029: FAILED ceph_assert
(log.t.seq == log.seq_live)\n",
   "assert_thread_name": "bstore_kv_sync",
   "backtrace": [
       "/lib64/libc.so.6(+0x3e730) [0x7ff9930f5730]",
       "/lib64/libc.so.6(+0x8bbdc) [0x7ff993142bdc]",
       "raise()",
       "abort()",
       "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x179) [0x55882dfb7fdd]",
       "/usr/bin/ceph-osd(+0x36b13e) [0x55882dfb813e]",
       "/usr/bin/ceph-osd(+0x9cff3b) [0x55882e61cf3b]",
       "(BlueFS::_flush_and_sync_log_jump_D(unsigned long)+0x4e) [0x55882e6291ee]",
       "(BlueFS::_compact_log_async_LD_LNF_D()+0x59b) [0x55882e62e8fb]",
       "/usr/bin/ceph-osd(+0x9f2b15) [0x55882e63fb15]",
       "(BlueFS::fsync(BlueFS::FileWriter*)+0x1b9) [0x55882e631989]",
       "/usr/bin/ceph-osd(+0x9f4889) [0x55882e641889]",
       "/usr/bin/ceph-osd(+0xd74cd5) [0x55882e9c1cd5]",
       "(rocksdb::WritableFileWriter::SyncInternal(bool)+0x483) [0x55882eade393]",
       "(rocksdb::WritableFileWriter::Sync(bool)+0x120) [0x55882eae0b60]",
       "(rocksdb::DBImpl::WriteToWAL(rocksdb::WriteThread::WriteGroup const&, rocksdb::log::Writer*, unsigned long*, bool, bool, unsigned long)+0x337) [0x55882ea00ab7]",
       "(rocksdb::DBImpl::WriteImpl(rocksdb::WriteOptions const&, rocksdb::WriteBatch*, rocksdb::WriteCallback*, unsigned long*, unsigned long, bool, unsigned long*, unsigned long, rocksdb
::PreReleaseCallback*)+0x1935) [0x55882ea07675]",
       "(rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch*)+0x35) [0x55882ea077c5]",
       "(RocksDBStore::submit_common(rocksdb::WriteOptions&, std::shared_ptr<KeyValueDB::TransactionImpl>)+0x83) [0x55882e992593]",
       "(RocksDBStore::submit_transaction_sync(std::shared_ptr<KeyValueDB::TransactionImpl>)+0x99) [0x55882e992ee9]",
       "(BlueStore::_kv_sync_thread()+0xf64) [0x55882e578e24]",
       "/usr/bin/ceph-osd(+0x8afb81) [0x55882e4fcb81]",
       "/lib64/libc.so.6(+0x89e92) [0x7ff993140e92]",
       "/lib64/libc.so.6(+0x10ef20) [0x7ff9931c5f20]"
   ],
   "ceph_version": "17.2.8",
   "crash_id": "2025-02-03T09:19:08.749233Z_9e2800fb-77f6-46cb-8087-203ea15a2039",
   "entity_name": "osd.211",
   "os_id": "centos",
   "os_name": "CentOS Stream",
   "os_version": "9",
   "os_version_id": "9",
   "process_name": "ceph-osd",
   "stack_sig": "ba90de24e2beba9c6a75249a4cce7c533987ca5127cfba5b835a3456174d6080",
   "timestamp": "2025-02-03T09:19:08.749233Z",
   "utsname_hostname": "afra-osd18",
   "utsname_machine": "x86_64",
   "utsname_release": "5.15.0-119-generic",
   "utsname_sysname": "Linux",
   "utsname_version": "#129-Ubuntu SMP Fri Aug 2 19:25:20 UTC 2024"
}

上述崩溃日志突出显示了 BlueFS 组件中的断言失败，具体位于函数 BlueFS::_log_advance_seq() 中。尽管我们努力分析和解决问题，但我们陷入了僵局。

为了完整起见，我们已经使用 smartctl 验证了我们磁盘的健康状况，并且它们都被认为是健康的。

我们恳请社区提供如何解决此问题的指导或任何建议的深入诊断步骤。我们感谢您在此故障排除过程中的支持和专业知识。

感谢您的关注和帮助。

osd 日志文件：https://paste.mozilla.org/6STm6eum

Tintin

Asked: 2024-10-06 08:53:26 +0800 CST

将 Ceph OSD 守护进程迁移至新服务规范

5

我有一个服务规范，将所有空闲的 SSD 分配给 OSD：

service\_type: osd  
service\_id: dashboard-tintin-7634852880  
service\_name: osd.dashboard-tintin-7634852880  
placement:  
  host\_pattern: '*'  
spec:  
  data\_devices:  
rotational: false  
  filter\_logic: AND  
  objectstore: bluestore

我希望对每台服务器分配的驱动器有更好的控制权，因此我创建了一些新的规范，如下所示：

service_type: osd  
service_id: dashboard-tintin-1715222958508  
service_name: osd.dashboard-tintin-1715222958508  
placement:  
  host_pattern: 'host1'  
spec:  
  data_devices:  
rotational: false  
  filter_logic: AND  
  objectstore: bluestore

在 Ceph Dashboard -> Services 中，我可以看到我的旧 OSD 守护进程继续在旧服务定义的控制下运行。我删除了旧的服务定义。我收到警告：

If osd.dashboard-tintin-7634852880 is removed the the following OSDs will remain, --force to proceed anyway ...

我认为让守护进程继续运行是我想要的，因此我继续使用--force。现在 Ceph 仪表板 -> 服务列出了 OSD 和“未管理”，而新的服务定义仍然没有选择它们。我如何在新的服务规范下移动这些 OSD 守护进程？

如果我停止守护进程，新的守护进程将不会按照新的服务定义启动。如果我重新部署守护进程，它们仍会显示为“未管理”。我让它们按照新的服务定义运行的唯一方法是停止守护进程并清除驱动器。然而，考虑到集群的大小，这不是一个实用的解决方案。

鉴于数据是存在的并且是正确的，我很惊讶没有办法让流浪守护进程恢复。（我查看了有关流浪守护进程的文档，但它们仅参考了将集群升级到 cephadm 的上下文）。

这是我的一部分ceph orch ls osd --export：

service_type: osd
service_id: dashboard-tintin-1706434852880
service_name: osd.dashboard-tintin-1706434852880
unmanaged: true
spec:
  filter_logic: AND
  objectstore: bluestore
---
service_type: osd
service_id: dashboard-tintin-1715222958508
service_name: osd.dashboard-tintin-1715222958508
placement:
  host_pattern: ceph-pn-osd1
spec:
  data_devices:
    rotational: false
  filter_logic: AND
  objectstore: bluestore
---
service_type: osd
service_id: dashboard-tintin-1712545397532
service_name: osd.dashboard-tintin-1712545397532
placement:
  host_pattern: ceph-pn-osd2
spec:
  data_devices:
    rotational: false
  filter_logic: AND
  objectstore: bluestore
---
service_type: osd
service_id: dashboard-tintin-1706421419210
service_name: osd.dashboard-tintin-1706421419210
placement:
  host_pattern: ceph-pn-osd3
spec:
  data_devices:
    rotational: false
  filter_logic: AND
  objectstore: bluestore
---
service_type: osd
service_id: dashboard-tintin-1706421419211
service_name: osd.dashboard-tintin-1706421419211
placement:
  host_pattern: ceph-pn-osd4
spec:
  data_devices:
    rotational: false
  filter_logic: AND
  objectstore: bluestore
---
service_type: osd
service_id: dashboard-tintin-1706425693555
service_name: osd.dashboard-tintin-1706425693555
placement:
  host_pattern: ceph-pn-osd5
spec:
  data_devices:
    rotational: false
  filter_logic: AND
  objectstore: bluestore

sam123

Asked: 2024-09-28 20:00:42 +0800 CST

Ceph 运行“ceph dashed iscsi-gateway-add -i 1.conf images”后遇到问题

5

我已经正确配置了 3 个存储节点和 1 个控制节点（控制节点具有存储节点服务），以及正确配置的 iSCI 存储。但是，我注意到，为了在仪表板上查看 iSCI 监控，必须配置相关内容

ceph dashboard iscsi-gateway-add  -i 1.conf images

conf 包含以下内容

http://admin: [email protected] :5588

http://admin: [email protected] :5588

http://admin: [email protected] :5588

运行成功后显示Success，但是之后出现了dashboard首页无法正常使用，dashboard中的监控ISCSI网关不可用等问题，命令行也出现了相关错误。

root@node1:~#  ceph dashboard iscsi-gateway-list
Error EINVAL: Traceback (most recent call last):
  File "/usr/share/ceph/mgr/mgr_module.py", line 1759, in _handle_command
    return CLICommand.COMMANDS[cmd['prefix']].call(self, cmd, inbuf)
  File "/usr/share/ceph/mgr/mgr_module.py", line 462, in call
    return self.func(mgr, **kwargs)
  File "/usr/share/ceph/mgr/dashboard/services/iscsi_cli.py", line 21, in list_iscsi_gateways
    return 0, json.dumps(IscsiGatewaysConfig.get_gateways_config()), ''
  File "/usr/share/ceph/mgr/dashboard/services/iscsi_config.py", line 104, in get_gateways_config
    return cls._load_config_from_store()
  File "/usr/share/ceph/mgr/dashboard/services/iscsi_config.py", line 47, in _load_config_from_store
    cls.update_iscsi_config(config)
  File "/usr/share/ceph/mgr/dashboard/services/iscsi_config.py", line 64, in update_iscsi_config
    service_url=service_url).get_hostname()['data']
  File "/usr/share/ceph/mgr/dashboard/services/iscsi_client.py", line 42, in instance
    port = url.port
  File "/lib64/python3.6/urllib/parse.py", line 181, in port
    port = int(port, 10)
ValueError: invalid literal for int() with base 10: '5588http:'

我的要求是恢复正常状态并成功配置监控谢谢

izarc

Asked: 2023-08-25 21:11:51 +0800 CST

Ceph：Rados 网关 Web 端点没有响应

6

我正在创建一个简单的 Ceph 集群并尝试连接到 Ceph 网关。

这是我的 Ceph 集群的 ceph 状态输出：

  cluster:
    id:     a7f64266-0894-4f1e-a635-d0aeaca0e993
    health: HEALTH_WARN
            mon is allowing insecure global_id reclaim
            1 monitors have not enabled msgr2
            5 pool(s) have no replicas configured

  services:
    mon: 1 daemons, quorum rhcsa (age 4h)
    mgr: rhcsa(active, since 8s)
    osd: 1 osds: 1 up (since 4h), 1 in (since 4h)

  data:
    pools:   5 pools, 129 pgs
    objects: 27 objects, 453 KiB
    usage:   22 MiB used, 20 GiB / 20 GiB avail
    pgs:     129 active+clean

这是 /etc/ceph/ceph.conf 下的 Ceph 配置：

[global]
fsid = a7f64266-0894-4f1e-a635-d0aeaca0e993
mon_initial_members = rhcsa
mon_host = 192.168.122.61
public_network = 192.168.122.0/24
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
osd_pool_default_size = 1
osd_pool_default_min_size = 1
osd_pool_default_pg_num = 333
osd_crush_chooseleaf_type = 1


[client.rgw.rhcsa]
host = rhcsa
rgw dns name = rhcsa
log file = /var/log/ceph/client.rgw.rhcsa.log
keyring = /var/lib/ceph/radosgw/ceph-rgw.rhcsa/keyring
rgw frontends = "beast port=8080"

我为 rgw 创建了以下目录： /var/lib/ceph/radosgw/ceph-rgw.rhcsa

然后是钥匙圈：

sudo ceph-authtool --create-keyring /var/lib/ceph/radosgw/ceph-rgw.rhcsa/keyring
sudo chmod +r /var/lib/ceph/radosgw/ceph-rgw.rhcsa/keyring
sudo ceph-authtool /var/lib/ceph/radosgw/ceph-rgw.rhcsa/keyring -n client.rgw.rhcsa --gen-key   
sudo ceph-authtool -n client.rgw.rhcsa --cap osd 'allow rwx' --cap mon 'allow rwx' /var/lib/ceph/radosgw/ceph-rgw.rhcsa/keyring

然后启动Ceph的RadosGW服务：

sudo systemctl restart [email protected]

RadosGW 服务似乎运行良好：

[root@rhcsa ~]# systemctl status [email protected]
● [email protected] - Ceph rados gateway
     Loaded: loaded (/usr/lib/systemd/system/[email protected]; disabled; preset: disabled)
     Active: active (running) 
   Main PID: 18501 (radosgw)
      Tasks: 9
     Memory: 6.5M
        CPU: 39ms
     CGroup: /system.slice/system-ceph\x2dradosgw.slice/[email protected]
             └─18501 /usr/bin/radosgw -f --cluster ceph --name client.rgw --setuser ceph --setgroup ceph

rhcsa systemd[1]: Started Ceph rados gateway.

但是，当我尝试从 8080 获取响应时，我什么也没得到：

[root@rhcsa ~]# curl http://localhost:8080
curl: (7) Failed to connect to localhost port 8080: Connection refused

我使用的是 Rocky Linux 9.2，Ceph 版本为 17.2.6 quincy/stable。

Froxz

Asked: 2022-11-10 23:54:55 +0800 CST

Ceph 警报管理器配置

6

我已经安装ceph使用cephadm

包括监控堆栈prometheus, alertmanager,node-exporter

目前我正在尝试为添加telegram接收器（从 v0.24.0 开始支持 Telegram，因此我手动将mgr/container_image_alertmanager0.23 更新到 0.24）alertmanager，但在文档中找不到alertmanager.yml应该创建的位置。

我可以看到这个文件是在 ceph 集群中创建的/var/lib/ceph/{hash}/alertmanager.ceph-1/etc/alertmanager/alertmanager.yml

我已经将配置添加到上面的文件中，如下所示：

templates:
  - '/etc/alertmanager/config/*.tmpl'
route:
  receiver: 'default'
  routes:
    - group_by: ['alertname']
      group_wait: 10s
      group_interval: 10s
      repeat_interval: 30m
      receiver: 'telegram'
receivers:
- name: 'default'
  webhook_configs:
- name: 'ceph-dashboard'
  webhook_configs:
  - url: 'https://ceph-1:8443/api/prometheus_receiver'
- name: 'telegram'
  telegram_configs:
    - bot_token: <bot_token>
      chat_id: <chat_id>
      send_resolved: true
      parse_mode: 'HTML'
      api_url: 'https://api.telegram.org'
      message: '{{ template "telegram.text" . }}'

接收器工作正常，但从alertmanagerCeph 仪表板重新部署后，配置消失了，并且合乎逻辑，因为我正在编辑生成的文件。

请如果有人可以协助和/或指出正确的方向，我应该创建 alertmanager 配置以扩展/覆盖默认值。

Ztevoz Milloz

Asked: 2022-10-06 03:14:44 +0800 CST

在 Ceph 集群中，对于相同的总原始卷，例如 5 个节点中的每个 32To，如何在 2To 的 16 个 ssd 或 4To 的 8 个之间进行选择？有什么指导规则吗？

0

我正在为我的公司（VM et 文件基础架构）设计一个混合 cephFS 和 rbd 的 ceph 集群。

在我的集合中，我需要按节点存储 32To 原始存储。我从 5 个节点开始。

卖家报价建议我按节点选择 2 teras 的 16 ssd 或 4 teras 的 8 ssd。

我的意思是在这些情况下对 ceph 管理的 IOPS 有什么影响，重建延迟等。可能的问题与 Ceph 高度相关，而不是一般性的。

我遵循了许多指南来推动我在许多方面的选择，包括 ceph 文档和书籍。但不确定是否真的有这个问题的答案。我发现的唯一线索是“越大越好”……

我需要遵循哪种方法来选择论文 2 个选项？

如果需要，这里有一些细节：专用于 cephs Vlan 的网络网卡是 25Gb 速度，redondants 等我已经考虑在 OSD 上为 1 tera 至少乘以 4Gb Ram，因此按节点计算 128 会很大。SSD 磁盘是企业安装和读取密集型的。

谢谢你的帮助

干杯

兹泰沃兹

Aleksandr Makhov

Asked: 2022-01-25 07:25:15 +0800 CST

Ceph RGW 16.2.7 CLI 更改

0

我正在尝试使用最新的软件版本 16.2.7 运行带有 Rados GW 的新 Ceph 集群，但是当我设置 RGW 节点时，我发现 CLI 与我之前测试的版本 16.2.4 相比有一些变化。

16.2.7 版本中缺少以下命令：

ceph dashboard set-rgw-api-user-id $USER
ceph dashboard set-rgw-api-access-key ...
ceph dashboard set-rgw-api-secret-key ...

ceph dashboad -h它们在 16.2.7 版本的输出中不存在：

# ceph dashboard -h | grep set-rgw-api | grep -v reset
dashboard set-rgw-api-access-key                     Set the RGW_API_ACCESS_KEY option value read from -i
dashboard set-rgw-api-admin-resource <value>         Set the RGW_API_ADMIN_RESOURCE option value
dashboard set-rgw-api-secret-key                     Set the RGW_API_SECRET_KEY option value read from -i
dashboard set-rgw-api-ssl-verify <value>             Set the RGW_API_SSL_VERIFY option value

但是在 16.2.4 版本上，一切都准备就绪：

# ceph dashboard -h | grep set-rgw-api | grep -v reset
dashboard set-rgw-api-access-key                                                                          Set the RGW_API_ACCESS_KEY option value read from -i <file>
dashboard set-rgw-api-admin-resource <value>                                                              Set the RGW_API_ADMIN_RESOURCE option value
dashboard set-rgw-api-host <value>                                                                        Set the RGW_API_HOST option value
dashboard set-rgw-api-port <value>                                                                        Set the RGW_API_PORT option value
dashboard set-rgw-api-scheme <value>                                                                      Set the RGW_API_SCHEME option value
dashboard set-rgw-api-secret-key                                                                          Set the RGW_API_SECRET_KEY option value read from -i <file>
dashboard set-rgw-api-ssl-verify <value>                                                                  Set the RGW_API_SSL_VERIFY option value
dashboard set-rgw-api-user-id <value>                                                                     Set the RGW_API_USER_ID option value

你知道这个命令的替代品是什么或者它们被移动到哪里了吗？先感谢您。

UDP：在这两种情况下，主机操作系统都是 Debian 10。

两个 RGW 设置的 ceph 包列表相同：16.2.4：

# dpkg -l | grep ceph
ii  ceph                              16.2.4-1~bpo10+1             amd64        distributed storage and file system                                                                                                
ii  ceph-base                         16.2.4-1~bpo10+1             amd64        common ceph daemon libraries and management tools                                                                                  
ii  ceph-common                       16.2.4-1~bpo10+1             amd64        common utilities to mount and interact with a ceph storage cluster                                                                 
ii  ceph-mgr                          16.2.4-1~bpo10+1             amd64        manager for the ceph distributed storage system                                                                                    
ii  ceph-mgr-modules-core             16.2.4-1~bpo10+1             all          ceph manager modules which are always enabled                                                                                      
ii  ceph-mon                          16.2.4-1~bpo10+1             amd64        monitor server for the ceph storage system                                                                                         
ii  ceph-osd                          16.2.4-1~bpo10+1             amd64        OSD server for the ceph storage system                                                                                             
ii  libcephfs2                        16.2.4-1~bpo10+1             amd64        Ceph distributed file system client library                                                                                        
ii  libsqlite3-mod-ceph               16.2.4-1~bpo10+1             amd64        SQLite3 VFS for Ceph
ii  python3-ceph-argparse             16.2.4-1~bpo10+1             all          Python 3 utility libraries for Ceph CLI                                                                                            
ii  python3-ceph-common               16.2.4-1~bpo10+1             all          Python 3 utility libraries for Ceph                                                                                                
ii  python3-cephfs                    16.2.4-1~bpo10+1             amd64        Python 3 libraries for the Ceph libcephfs library

# dpkg -l | grep rados
ii  librados2                         16.2.4-1~bpo10+1             amd64        RADOS distributed object store client library
ii  libradosstriper1                  16.2.4-1~bpo10+1             amd64        RADOS striping interface
ii  python3-rados                     16.2.4-1~bpo10+1             amd64        Python 3 libraries for the Ceph librados library
ii  radosgw                           16.2.4-1~bpo10+1             amd64        REST gateway for RADOS distributed object store

16.2.7：

# dpkg -l | grep ceph
ii  ceph                              16.2.7-1~bpo10+1             amd64        distributed storage and file system
ii  ceph-base                         16.2.7-1~bpo10+1             amd64        common ceph daemon libraries and management tools
ii  ceph-common                       16.2.7-1~bpo10+1             amd64        common utilities to mount and interact with a ceph storage cluster
ii  ceph-mgr                          16.2.7-1~bpo10+1             amd64        manager for the ceph distributed storage system
ii  ceph-mgr-modules-core             16.2.7-1~bpo10+1             all          ceph manager modules which are always enabled
ii  ceph-mon                          16.2.7-1~bpo10+1             amd64        monitor server for the ceph storage system
ii  ceph-osd                          16.2.7-1~bpo10+1             amd64        OSD server for the ceph storage system
ii  libcephfs2                        16.2.7-1~bpo10+1             amd64        Ceph distributed file system client library
ii  libsqlite3-mod-ceph               16.2.7-1~bpo10+1             amd64        SQLite3 VFS for Ceph
ii  python3-ceph-argparse             16.2.7-1~bpo10+1             all          Python 3 utility libraries for Ceph CLI
ii  python3-ceph-common               16.2.7-1~bpo10+1             all          Python 3 utility libraries for Ceph
ii  python3-cephfs                    16.2.7-1~bpo10+1             amd64        Python 3 libraries for the Ceph libcephfs library

# dpkg -l | grep rados
ii  librados2                         16.2.7-1~bpo10+1             amd64        RADOS distributed object store client library
ii  libradosstriper1                  16.2.7-1~bpo10+1             amd64        RADOS striping interface
ii  python3-rados                     16.2.7-1~bpo10+1             amd64        Python 3 libraries for the Ceph librados library
ii  radosgw                           16.2.7-1~bpo10+1             amd64        REST gateway for RADOS distributed object store

ony4869

Asked: 2021-10-08 22:26:16 +0800 CST

Openstack Wallaby 卷备份问题

0

最近我正在测试具有 1 个主节点和 3 个基于 Openstack（版本：Wallaby）的工作节点的 kubernetes。它涉及大量测试，因此我想事先将卷备份到本地机器。

卷后端使用 Ceph。浏览互联网，它建议将卷转换为图像并将图像下载到本地计算机。我可以成功地将我的 20GB 卷转换为 QCOW2 格式。但是，我的 120GB 卷失败了。

我已经执行了以下检查：

(1) 我检查了/etc/glance/glance-api.conf下的glance配置。image_size_cap 参数设置为 4TB。我也重新启动了 Glance api 服务。

(2) 我使用 ceph df 命令检查了 ceph 后端，并且 ceph 有足够的容量。

(3)glance 节点也有足够的存储空间。但是，我尝试检查 /var/lib/glance/images 下的图像转换进度，但即使 Openstack 上存在图像，该目录也是空的。

有人打过同样的吗？并且可以给我一些建议..

非常感谢你。

Jack Slingerland

Asked: 2021-10-03 17:05:44 +0800 CST

Ceph 连接到本地节点

0

我对要构建的应用程序有一个想法，其中一个要求是全局复制的文件系统。存在像 Ceph 和 GlusterFS 这样的东西，但我不确定它们是否符合我的特定用例。

假设我在 3 个不同的地区 [美国、欧洲、亚洲] 有 3 个应用服务器
然后我有一个 3 节点 Ceph 设置，在每个区域 [美国、欧洲、亚洲] 有 1 个节点
我可以让每个应用服务器直接连接到他们所在区域的 Ceph 节点，还是必须通过一些集中的编排节点？

我问是因为我想将文件系统延迟保持在最低限度，并且只使用 Ceph 来同步所有节点之间的更改。如果我不能直接连接到“本地”节点，我认为延迟会非常高。

任何帮助理解这一点将不胜感激！

Behzad

Asked: 2021-04-04 23:51:52 +0800 CST

bluestore(/var/lib/ceph/osd/ceph-2/block) _read_bdev_label 无法打开 /var/lib/ceph/osd/ceph-2/block: (1) 不允许操作

2

当我从 Octopus 15.2.10 升级到 Pacific 16.2.0 时，mon 节点使用手动升级过程成功启动（通过安装没有 orch 的软件包）但是当我升级 OSD 时，ceph-osd 服务不会开始。值得一提的是，我使用了 Ubutntu Focal Ceph 包进行升级（没有 orch）

值得一提的是，当我发出“/usr/bin/ceph-osd -f --cluster ceph --id 2 --setuser ceph --setgroup ceph”时，它开始成功加入集群

这是 ceph-osd.log 消息：

将 uid:gid 设置为 64045:64045 (ceph:ceph) ceph 版本 16.2.0 (0c2054e95bcd9b30fdd908a79ac1d8bbc3394442) pacific (stable), 进程 ceph-osd, pid 1974 pidfile_write: 忽略空 --pid-file bluestore(/var/lib/ceph /osd/ceph-2/block) _read_bdev_label 无法打开 /var/lib/ceph/osd/ceph-2/block: (1) 不允许操作 ** 错误: 无法在 /var/lib/ceph 上打开 OSD 超级块/osd/ceph-2: (2) 没有那个文件或目录

很感谢任何形式的帮助。

升级至 Ceph Quincy 17.2.8 后 OSDS 稳定性问题

将 Ceph OSD 守护进程迁移至新服务规范

Ceph 运行“ceph dashed iscsi-gateway-add -i 1.conf images”后遇到问题

Ceph：Rados 网关 Web 端点没有响应

Ceph 警报管理器配置

在 Ceph 集群中，对于相同的总原始卷，例如 5 个节点中的每个 32To，如何在 2To 的 16 个 ssd 或 4To 的 8 个之间进行选择？有什么指导规则吗？

Ceph RGW 16.2.7 CLI 更改

Openstack Wallaby 卷备份问题

Ceph 连接到本地节点

bluestore(/var/lib/ceph/osd/ceph-2/block) _read_bdev_label 无法打开 /var/lib/ceph/osd/ceph-2/block: (1) 不允许操作

新安装后 postgres 的默认超级用户用户名/密码是什么？

SFTP 使用什么端口？

命令行列出 Windows Active Directory 组中的用户？

什么是 Pem 文件，它与其他 OpenSSL 生成的密钥文件格式有何不同？

如何确定bash变量是否为空？

问题[ceph](server)