AskOverflow.Dev

AskOverflow.Dev Logo AskOverflow.Dev Logo

AskOverflow.Dev Navigation

  • 主页
  • 系统&网络
  • Ubuntu
  • Unix
  • DBA
  • Computer
  • Coding
  • LangChain

Mobile menu

Close
  • 主页
  • 系统&网络
    • 最新
    • 热门
    • 标签
  • Ubuntu
    • 最新
    • 热门
    • 标签
  • Unix
    • 最新
    • 标签
  • DBA
    • 最新
    • 标签
  • Computer
    • 最新
    • 标签
  • Coding
    • 最新
    • 标签
主页 / server / 问题 / 405982
Accepted
quanta
quanta
Asked: 2012-07-10 01:18:48 +0800 CST2012-07-10 01:18:48 +0800 CST 2012-07-10 01:18:48 +0800 CST

MySQL: Pacemaker 无法将失败的主机作为新的从机启动?

  • 772
  • 起搏器-1.0.12-1
  • corosync-1.2.7-1.1

我将按照本指南为 MySQL 复制(1 个主服务器和 1 个从服务器)设置故障转移: https ://github.com/jayjanssen/Percona-Pacemaker-Resource-Agents/blob/master/doc/PRM-setup-guide .rst

这是输出crm configure show:

node serving-6192 \
    attributes p_mysql_mysql_master_IP="192.168.6.192"
node svr184R-638.localdomain \
    attributes p_mysql_mysql_master_IP="192.168.6.38"
primitive p_mysql ocf:percona:mysql \
    params config="/etc/my.cnf" pid="/var/run/mysqld/mysqld.pid"
socket="/var/lib/mysql/mysql.sock" replication_user="repl"
replication_passwd="x" test_user="test_user" test_passwd="x" \
    op monitor interval="5s" role="Master" OCF_CHECK_LEVEL="1" \
    op monitor interval="2s" role="Slave" timeout="30s"
OCF_CHECK_LEVEL="1" \
    op start interval="0" timeout="120s" \
    op stop interval="0" timeout="120s"
primitive writer_vip ocf:heartbeat:IPaddr2 \
    params ip="192.168.6.8" cidr_netmask="32" \
    op monitor interval="10s" \
    meta is-managed="true"
ms ms_MySQL p_mysql \
    meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true" globally-unique="false"
target-role="Master" is-managed="true"
colocation writer_vip_on_master inf: writer_vip ms_MySQL:Master
order ms_MySQL_promote_before_vip inf: ms_MySQL:promote writer_vip:start
property $id="cib-bootstrap-options" \
    dc-version="1.0.12-unknown" \
    cluster-infrastructure="openais" \
    expected-quorum-votes="2" \
    no-quorum-policy="ignore" \
    stonith-enabled="false" \
    last-lrm-refresh="1341801689"
property $id="mysql_replication" \
    p_mysql_REPL_INFO="192.168.6.192|mysql-bin.000006|338"

crm_mon:

Last updated: Mon Jul  9 10:30:01 2012
Stack: openais
Current DC: serving-6192 - partition with quorum
Version: 1.0.12-unknown
2 Nodes configured, 2 expected votes
2 Resources configured.
============

Online: [ serving-6192 svr184R-638.localdomain ]

 Master/Slave Set: ms_MySQL
     Masters: [ serving-6192 ]
     Slaves: [ svr184R-638.localdomain ]
writer_vip    (ocf::heartbeat:IPaddr2):    Started serving-6192

编辑/etc/my.cnf错误语法的 serving-6192 以测试故障转移并且它工作正常:

  • svr184R-638.localdomain 被提升为 master
  • writer_vip 切换到 svr184R-638.localdomain

当前状态:

Last updated: Mon Jul  9 10:35:57 2012
Stack: openais
Current DC: serving-6192 - partition with quorum
Version: 1.0.12-unknown
2 Nodes configured, 2 expected votes
2 Resources configured.
============

Online: [ serving-6192 svr184R-638.localdomain ]

 Master/Slave Set: ms_MySQL
     Masters: [ svr184R-638.localdomain ]
     Stopped: [ p_mysql:0 ]
writer_vip    (ocf::heartbeat:IPaddr2):    Started svr184R-638.localdomain

Failed actions:
    p_mysql:0_monitor_5000 (node=serving-6192, call=15, rc=7,
status=complete): not running
    p_mysql:0_demote_0 (node=serving-6192, call=22, rc=7,
status=complete): not running
    p_mysql:0_start_0 (node=serving-6192, call=26, rc=-2, status=Timed
Out): unknown exec error

/etc/my.cnf从serving-6192 上 删除错误的语法,然后重新启动corosync,我希望看到的是 serving-6192 作为新的从站启动,但它没有:

Failed actions:
    p_mysql:0_start_0 (node=serving-6192, call=4, rc=1,
status=complete): unknown error

这是我怀疑的日志片段:

Jul 09 10:46:32 serving-6192 lrmd: [7321]: info: rsc:p_mysql:0:4: start
Jul 09 10:46:32 serving-6192 lrmd: [7321]: info: RA output:
(p_mysql:0:start:stderr) Error performing operation: The
object/attribute does not exist

Jul 09 10:46:32 serving-6192 crm_attribute: [7420]: info: Invoked:
/usr/sbin/crm_attribute -N serving-6192 -l reboot --name readable -v 0

/var/log/cluster/corosync.log: http://fpaste.org/AyOZ/

奇怪的是我可以手动启动它:

export OCF_ROOT=/usr/lib/ocf
export OCF_RESKEY_config="/etc/my.cnf"
export OCF_RESKEY_pid="/var/run/mysqld/mysqld.pid"
export OCF_RESKEY_socket="/var/lib/mysql/mysql.sock"
export OCF_RESKEY_replication_user="repl"
export OCF_RESKEY_replication_passwd="x"
export OCF_RESKEY_test_user="test_user"
export OCF_RESKEY_test_passwd="x"

sh -x /usr/lib/ocf/resource.d/percona/mysql start: http://fpaste.org/RVGh/

我做错了什么吗?


回复 @Patrick Fri Jul 13 10:22:10 ICT 2012:

我不确定为什么它会失败,因为您的日志不包含来自资源脚本(ocf_log 命令)的任何消息

我把这一切都从/var/log/cluster/corosync.log。你心里有什么理由吗?

/etc/corosync/corosync.conf

compatibility: whitetank

totem {
    version: 2
    secauth: off
    threads: 0
    interface {
        member {
            memberaddr: 192.168.6.192
        }
        member {
            memberaddr: 192.168.6.38
        }
        ringnumber: 0
        bindnetaddr: 192.168.6.0
        mcastaddr: 226.94.1.1
        mcastport: 5405
    }
}

logging {
    fileline: off
    to_stderr: yes
    to_logfile: yes
    to_syslog: yes
    logfile: /var/log/cluster/corosync.log
    debug: on
    timestamp: on
    logger_subsys {
        subsys: AMF
        debug: off
    }
}

amf {
    mode: disabled
}

手动运行脚本时脚本起作用的原因也是因为您没有设置告诉脚本它是主/从资源的变量。所以当它运行时,脚本认为它只是一个独立的实例。

谢谢。我已将以下变量附加到我的~/.bash_profile:

export OCF_RESKEY_CRM_meta_clone_max="2"
export OCF_RESKEY_CRM_meta_role="Slave"

使其生效. ~/.bash_profile并手动启动mysql资源:

sh -x /usr/lib/ocf/resource.d/percona/mysql start: http://fpaste.org/EMwa/

它工作正常:

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.6.38
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000072
          Read_Master_Log_Pos: 1428602
               Relay_Log_File: mysqld-relay-bin.000006
                Relay_Log_Pos: 39370
        Relay_Master_Log_File: mysql-bin.000072
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 1428602
              Relay_Log_Space: 39527
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 123
1 row in set (0.00 sec)

停止 MySQL,打开调试,重新启动 corosync,这是日志:http ://fpaste.org/mZzS/

如您所见,只有“未知错误”:

 1.
    Jul 13 10:48:06 serving-6192 crmd: [3341]: debug:
    get_xpath_object: No match for
    //cib_update_result//diff-added//crm_config in
    /notify/cib_update_result/diff
 2.
    Jul 13 10:48:06 serving-6192 lrmd: [3338]: WARN: Managed
    p_mysql:1:start process 3416 exited with return code 1.
 3.
    Jul 13 10:48:06 serving-6192 crmd: [3341]: info:
    process_lrm_event: LRM operation p_mysql:1_start_0 (call=4,
    rc=1, cib-update=10, confirmed=true) unknown error

有什么想法吗?


更新 7 月 14 日星期六 17:16:03 ICT 2012:

@Patrick:谢谢你的提示!

Pacemaker 使用的环境变量如下: http: //fpaste.org/92yN/

正如我在与您聊天时所怀疑的那样,该节点serving-6192是从 开始的OCF_RESKEY_CRM_meta_master_max=1,因此,由于以下代码:

/usr/lib/ocf/resource.d/percona/mysql:

if ocf_is_ms; then
    mysql_extra_params="--skip-slave-start"
fi

/usr/lib/ocf//lib/heartbeat/ocf-shellfuncs:

ocf_is_ms() {
    [ ! -z "${OCF_RESKEY_CRM_meta_master_max}" ] && [ "${OCF_RESKEY_CRM_meta_master_max}" -gt 0 ]
}

额外的参数--skip-slave-start包括:

ps -ef | grep mysql

root 18215 1 0 17:12 pts/4 00:00:00 /bin/sh /usr/bin/mysqld_safe --defaults-file=/etc/my.cnf --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock --datadir=/var/lib/mysql --user=mysql --skip-slave-start

mysql 19025 18215 1 17:12 pts/4 00:00:14 /usr/libexec/mysqld --defaults-file=/etc/my.cnf --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --user=mysql --skip-slave-start --log-error=/var/log/mysqld.log --open-files-limit=8192 --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock --port=3306

但 SQL 线程仍在运行:

         Slave_IO_Running: Yes
        Slave_SQL_Running: Yes

并且复制工作正常。

IFS=$'\n' ENV=( $(cat /tmp/16374.env) ); env -i - "${ENV[@]}" sh -x /usr/lib/ocf/resource.d/percona/mysql start: http://fpaste.org/x7xE/

我的头撞在墙上 (: -> |

mysql-replication high-availability failover pacemaker corosync
  • 1 1 个回答
  • 13675 Views

1 个回答

  • Voted
  1. Best Answer
    quanta
    2012-07-15T07:21:46+08:002012-07-15T07:21:46+08:00

    尤里卡!

    我们都忘记了一个非常非常重要的日志文件,它是…… /var/log/mysqld.log:

    socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Server (GPL) by Atomicorp
    [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000082' at position 58569, relay log './mysqld-relay-bin.000002' position: 58715
    [Note] Slave I/O thread: connected to master '[email protected]:3306',replication started in log 'mysql-bin.000082' at position 58569
    [Warning] Aborted connection 10 to db: 'unconnected' user: 'test_user' host: 'localhost' (init_connect command failed)
    [Warning] The MySQL server is running with the --read-only option so it cannot execute this statement
    [Note] /usr/libexec/mysqld: Normal shutdown
    

    您可以猜到,我通过结合 binlog 和 来跟踪用户活动init-connect:

    init_connect = "INSERT INTO audit.accesslog (connect_time, user_host, connection_id) VALUES (NOW(), CURRENT_USER(), CONNECTION_ID());"

    但serving-6192在作为从站启动时设置为只读,然后当 Pacemaker 执行监控操作时test_user:

        # Check for test table
        ocf_run -q $MYSQL $MYSQL_OPTIONS_TEST \
            -e "SELECT COUNT(*) FROM $OCF_RESKEY_test_table"
    

    init_connect命令因上述错误而失败:

    MySQL 服务器正在使用--read-only选项运行,因此它无法执行此语句

    解决方案是我应该init_connect在初始化监控操作之前将选项设置为空字符串(在提升节点成为主节点时不要忘记将其转回)

    对于使用事件调度程序的任何人:另请注意,在将奴隶提升为主人时必须将其打开:

    set_event_scheduler() {
        local es_val
        if ocf_is_true $1; then
            es_val="on"
        else
            es_val="off"
        fi
        ocf_run $MYSQL $MYSQL_OPTIONS_REPL \
            -e "SET GLOBAL event_scheduler=${es_val}"
    }
    
    get_event_scheduler() {
        # Check if event-scheduler is set
        local event_scheduler_state
    
        event_scheduler_state=`$MYSQL $MYSQL_OPTIONS_REPL \
            -e "SHOW VARIABLES" | grep event_scheduler | awk '{print $2}'`
    
        if [ "$event_scheduler_state" = "ON" ]; then
            return 0
        else
            return 1
        fi
    }
    
    mysql_promote() {
        local master_info
    
        if ( ! mysql_status err ); then
            return $OCF_NOT_RUNNING
        fi
        ocf_run $MYSQL $MYSQL_OPTIONS_REPL \
            -e "STOP SLAVE"
    
        # Set Master Info in CIB, cluster level attribute
        update_data_master_status
        master_info="$(get_local_ip)|$(get_master_status File)|$(get_master_status Position)"
        ${CRM_ATTR_REPL_INFO} -v "$master_info"
        rm -f $tmpfile
    
        set_read_only off || return $OCF_ERR_GENERIC
        set_event_scheduler on || return $OCF_ERR_GENERIC
    

    降级时也不要忘记将其关闭:

        'pre-demote')
            # Is the notification for our set
            notify_resource=`echo $OCF_RESKEY_CRM_meta_notify_demote_resource|cut -d: -f1`
            my_resource=`echo $OCF_RESOURCE_INSTANCE|cut -d: -f1`
            if [ $notify_resource != ${my_resource} ]; then
                ocf_log debug "Notification is not for us"
                return $OCF_SUCCESS
            fi
    
            demote_host=`echo $OCF_RESKEY_CRM_meta_notify_demote_uname|tr -d " "`
            if [ $demote_host = ${HOSTNAME} ]; then
                ocf_log info "post-demote notification for $demote_host"
                set_read_only on
                set_event_scheduler off
    

    干杯,

    • 2

相关问题

  • 为什么--skip-slave-start 推荐与 MySQL `START SLAVE UNTIL` 一起使用?

  • 诊断 Mysql 复制问题

  • 为什么使用 --flush-logs 执行 mysqldump 时 mysql slave 会死掉?

  • 如何从 MySQL 导出权限,然后导入到新服务器?

  • 跨地理位置不同的服务器的 MySQL 复制

Sidebar

Stats

  • 问题 205573
  • 回答 270741
  • 最佳答案 135370
  • 用户 68524
  • 热门
  • 回答
  • Marko Smith

    新安装后 postgres 的默认超级用户用户名/密码是什么?

    • 5 个回答
  • Marko Smith

    SFTP 使用什么端口?

    • 6 个回答
  • Marko Smith

    命令行列出 Windows Active Directory 组中的用户?

    • 9 个回答
  • Marko Smith

    什么是 Pem 文件,它与其他 OpenSSL 生成的密钥文件格式有何不同?

    • 3 个回答
  • Marko Smith

    如何确定bash变量是否为空?

    • 15 个回答
  • Martin Hope
    Tom Feiner 如何按大小对 du -h 输出进行排序 2009-02-26 05:42:42 +0800 CST
  • Martin Hope
    Noah Goodrich 什么是 Pem 文件,它与其他 OpenSSL 生成的密钥文件格式有何不同? 2009-05-19 18:24:42 +0800 CST
  • Martin Hope
    Brent 如何确定bash变量是否为空? 2009-05-13 09:54:48 +0800 CST
  • Martin Hope
    cletus 您如何找到在 Windows 中打开文件的进程? 2009-05-01 16:47:16 +0800 CST

热门标签

linux nginx windows networking ubuntu domain-name-system amazon-web-services active-directory apache-2.4 ssh

Explore

  • 主页
  • 问题
    • 最新
    • 热门
  • 标签
  • 帮助

Footer

AskOverflow.Dev

关于我们

  • 关于我们
  • 联系我们

Legal Stuff

  • Privacy Policy

Language

  • Pt
  • Server
  • Unix

© 2023 AskOverflow.DEV All Rights Reserve