Acabei de configurar um cluster percona xtradb no Ubuntu com 2 nós.
Isto é o que eu fiz.
- lançou 2 novos servidores.
- Cluster percona instalado em ambos os nós.
- Parou o mysql em ambos os nós.
No servidor1:
/etc/mysql/percona-xtradb-cluster.conf.d/wsrep.cnf
wsrep_cluster_address=gcomm://172.31.5.172,172.31.13.81
binlog_format=ROW
wsrep_slave_threads= 8
wsrep_log_conflicts
innodb_autoinc_lock_mode=2
wsrep_node_address=172.31.5.172
wsrep_cluster_name=pxc-cluster
pxc_strict_mode=DISABLED
wsrep_sst_method=xtrabackup-v2
wsrep_sst_auth="sstuser:s3cretPass"
No Servidor 2:
/etc/mysql/percona-xtradb-cluster.conf.d/wsrep.cnf
wsrep_cluster_address=gcomm://172.31.5.172,172.31.13.81
binlog_format=ROW
wsrep_slave_threads= 8
wsrep_log_conflicts
innodb_autoinc_lock_mode=2
wsrep_node_address=172.31.13.81
wsrep_cluster_name=pxc-cluster
pxc_strict_mode=DISABLED
wsrep_sst_method=xtrabackup-v2
wsrep_sst_auth="sstuser:s3cretPass"
No server1:
/etc/init.d/mysql bootstrap-pxc
Então o mysql foi iniciado.
CREATE USER 'sstuser'@'localhost' IDENTIFIED BY 's3cretPass';
GRANT PROCESS, RELOAD, LOCK TABLES, REPLICATION CLIENT ON *.* TO 'sstuser'@'localhost';
FLUSH PRIVILEGES;
No Servidor 2:
service mysql start
A partir de agora sem problemas, testei a replicatina. Então eu reiniciei os dois nós.
Depois disso o mysql não inicia.
Servidor1:
2017-11-05T17:22:03.505946Z mysqld_safe Logging to '/var/log/mysqld.log'.
2017-11-05T17:22:03.515607Z mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
2017-11-05T17:22:03.520623Z mysqld_safe Skipping wsrep-recover for cc6a7fb6-c24b-11e7-bd13-ba1518e34f61:0 pair
2017-11-05T17:22:03.521158Z mysqld_safe Assigning cc6a7fb6-c24b-11e7-bd13-ba1518e34f61:0 to wsrep_start_position
2017-11-05T17:22:03.706708Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2017-11-05T17:22:03.708134Z 0 [Note] /usr/sbin/mysqld (mysqld 5.7.19-17-57-log) starting as process 11089 ...
2017-11-05T17:22:03.709661Z 0 [Warning] No argument was provided to --log-bin, and --log-bin-index was not used; so replication may break when this MySQL server acts as a master and has his hostname changed!! Please use '--log-bin=ip-172-31-5-172-bin' to avoid this problem.
2017-11-05T17:22:03.709964Z 0 [Note] WSREP: Setting wsrep_ready to false
2017-11-05T17:22:03.709977Z 0 [Note] WSREP: No pre-stored wsrep-start position found. Skipping position initialization.
2017-11-05T17:22:03.709982Z 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera3/libgalera_smm.so'
2017-11-05T17:22:03.712249Z 0 [Note] WSREP: wsrep_load(): Galera 3.22(r8678538) by Codership Oy <[email protected]> loaded successfully.
2017-11-05T17:22:03.712290Z 0 [Note] WSREP: CRC-32C: using hardware acceleration.
2017-11-05T17:22:03.712549Z 0 [Note] WSREP: Found saved state: cc6a7fb6-c24b-11e7-bd13-ba1518e34f61:0, safe_to_bootsrap: 1
2017-11-05T17:22:03.713823Z 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 172.31.5.172; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 10; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 4; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_count = 0; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 100; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.recovery = 1; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = PT30S; pc.weight = 1; protonet.backend = asio; protonet.version = 0; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.max_ws_size = 2147483647; repl.proto_max = 7; socket.checksum = 2; socket.recv_buf_size = 212992;
2017-11-05T17:22:03.727615Z 0 [Note] WSREP: Assign initial position for certification: 0, protocol version: -1
2017-11-05T17:22:03.727630Z 0 [Note] WSREP: Preparing to initiate SST/IST
2017-11-05T17:22:03.727634Z 0 [Note] WSREP: Starting replication
2017-11-05T17:22:03.727640Z 0 [Note] WSREP: Setting initial position to cc6a7fb6-c24b-11e7-bd13-ba1518e34f61:0
2017-11-05T17:22:03.727741Z 0 [Note] WSREP: Using CRC-32C for message checksums.
2017-11-05T17:22:03.727792Z 0 [Note] WSREP: gcomm thread scheduling priority set to other:0
2017-11-05T17:22:03.727852Z 0 [Warning] WSREP: Fail to access the file (/var/lib/mysql//gvwstate.dat) error (No such file or directory). It is possible if node is booting for first time or re-booting after a graceful shutdown
2017-11-05T17:22:03.727859Z 0 [Note] WSREP: Restoring primary-component from disk failed. Either node is booting for first time or re-booting after a graceful shutdown
2017-11-05T17:22:03.728266Z 0 [Note] WSREP: GMCast version 0
2017-11-05T17:22:03.728398Z 0 [Note] WSREP: (d73a400b, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2017-11-05T17:22:03.728405Z 0 [Note] WSREP: (d73a400b, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2017-11-05T17:22:03.728686Z 0 [Note] WSREP: EVS version 0
2017-11-05T17:22:03.728771Z 0 [Note] WSREP: gcomm: connecting to group 'pxc-cluster', peer '172.31.5.172:,172.31.13.81:'
2017-11-05T17:22:03.728686Z 0 [Note] WSREP: EVS version 0
2017-11-05T17:22:03.728771Z 0 [Note] WSREP: gcomm: connecting to group 'pxc-cluster', peer '172.31.5.172:,172.31.13.81:'
2017-11-05T17:22:03.729595Z 0 [Note] WSREP: (d73a400b, 'tcp://0.0.0.0:4567') connection established to d73a400b tcp://172.31.5.172:4567
2017-11-05T17:22:03.729609Z 0 [Warning] WSREP: (d73a400b, 'tcp://0.0.0.0:4567') address 'tcp://172.31.5.172:4567' points to own listening address, blacklisting
2017-11-05T17:22:06.729830Z 0 [Note] WSREP: (d73a400b, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://172.31.13.81:4567 timed out, no messages seen in PT3S
2017-11-05T17:22:06.729954Z 0 [Note] WSREP: (d73a400b, 'tcp://0.0.0.0:4567') connection to peer d73a400b with addr tcp://172.31.5.172:4567 timed out, no messages seen in PT3S
2017-11-05T17:22:06.730072Z 0 [Warning] WSREP: no nodes coming from prim view, prim not possible
2017-11-05T17:22:06.730101Z 0 [Note] WSREP: Current view of cluster as seen by this node
view (view_id(NON_PRIM,d73a400b,1)
memb {
d73a400b,0
}
joined {
}
left {
}
partitioned {
}
)
2017-11-05T17:22:07.230228Z 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.50154S), skipping check
2017-11-05T17:22:10.730540Z 0 [Note] WSREP: (d73a400b, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://172.31.13.81:4567 timed out, no messages seen in PT3S
2017-11-05T17:22:14.731257Z 0 [Note] WSREP: (d73a400b, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://172.31.13.81:4567 timed out, no messages seen in PT3S
2017-11-05T17:22:16.154486Z 0 [Note] WSREP: (d73a400b, 'tcp://0.0.0.0:4567') connection established to de905d70 tcp://172.31.13.81:4567
2017-11-05T17:22:16.154851Z 0 [Note] WSREP: (d73a400b, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2017-11-05T17:22:16.654815Z 0 [Note] WSREP: declaring de905d70 at tcp://172.31.13.81:4567 stable
2017-11-05T17:22:16.655411Z 0 [Warning] WSREP: no nodes coming from prim view, prim not possible
2017-11-05T17:22:16.655466Z 0 [Note] WSREP: Current view of cluster as seen by this node
view (view_id(NON_PRIM,d73a400b,2)
memb {
d73a400b,0
de905d70,0
}
joined {
}
left {
}
partitioned {
}
)
2017-11-05T17:22:19.232193Z 0 [Note] WSREP: (d73a400b, 'tcp://0.0.0.0:4567') turning message relay requesting off
Servidor2:
2017-11-05T17:22:15.979415Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2017-11-05T17:22:15.981265Z 0 [Note] mysqld (mysqld 5.7.19-17-57-log) starting as process 1325 ...
2017-11-05T17:22:15.983852Z 0 [Warning] No argument was provided to --log-bin, and --log-bin-index was not used; so replication may break when this MySQL server acts as a master and has his hostname changed!! Please use '--log-bin=ip-172-31-13-81-bin' to avoid this problem.
2017-11-05T17:22:15.984031Z 0 [Note] WSREP: Setting wsrep_ready to false
2017-11-05T17:22:15.984043Z 0 [Note] WSREP: No pre-stored wsrep-start position found. Skipping position initialization.
2017-11-05T17:22:15.984047Z 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera3/libgalera_smm.so'
2017-11-05T17:22:16.013201Z 0 [Note] WSREP: wsrep_load(): Galera 3.22(r8678538) by Codership Oy <[email protected]> loaded successfully.
2017-11-05T17:22:16.013252Z 0 [Note] WSREP: CRC-32C: using hardware acceleration.
2017-11-05T17:22:16.015682Z 0 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1, safe_to_bootsrap: 0
2017-11-05T17:22:16.019709Z 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 172.31.13.81; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 10; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 4; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_count = 0; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 100; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.recovery = 1; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = PT30S; pc.weight = 1; protonet.backend = asio; protonet.version = 0; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.max_ws_size = 2147483647; repl.proto_max = 7; socket.checksum = 2; socket.recv_buf_size = 212992;
2017-11-05T17:22:16.034431Z 0 [Note] WSREP: GCache history reset: cc6a7fb6-c24b-11e7-bd13-ba1518e34f61:0 -> 00000000-0000-0000-0000-000000000000:-1
2017-11-05T17:22:16.035026Z 0 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
2017-11-05T17:22:16.035038Z 0 [Note] WSREP: Preparing to initiate SST/IST
2017-11-05T17:22:16.035041Z 0 [Note] WSREP: Starting replication
2017-11-05T17:22:16.035048Z 0 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
2017-11-05T17:22:16.035159Z 0 [Note] WSREP: Using CRC-32C for message checksums.
2017-11-05T17:22:16.035214Z 0 [Note] WSREP: gcomm thread scheduling priority set to other:0
2017-11-05T17:22:16.036228Z 0 [Warning] WSREP: Fail to access the file (/var/lib/mysql//gvwstate.dat) error (No such file or directory). It is possible if node is booting for first time or re-booting after a graceful shutdown
2017-11-05T17:22:16.036250Z 0 [Note] WSREP: Restoring primary-component from disk failed. Either node is booting for first time or re-booting after a graceful shutdown
2017-11-05T17:22:16.036685Z 0 [Note] WSREP: GMCast version 0
2017-11-05T17:22:16.036828Z 0 [Note] WSREP: (de905d70, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2017-11-05T17:22:16.036836Z 0 [Note] WSREP: (de905d70, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2017-11-05T17:22:16.037997Z 0 [Note] WSREP: EVS version 0
2017-11-05T17:22:16.038072Z 0 [Note] WSREP: gcomm: connecting to group 'pxc-cluster', peer '172.31.5.172:,172.31.13.81:'
2017-11-05T17:22:16.039496Z 0 [Note] WSREP: (de905d70, 'tcp://0.0.0.0:4567') connection established to de905d70 tcp://172.31.13.81:4567
2017-11-05T17:22:16.039513Z 0 [Warning] WSREP: (de905d70, 'tcp://0.0.0.0:4567') address 'tcp://172.31.13.81:4567' points to own listening address, blacklisting
2017-11-05T17:22:16.039960Z 0 [Note] WSREP: (de905d70, 'tcp://0.0.0.0:4567') connection established to d73a400b tcp://172.31.5.172:4567
2017-11-05T17:22:16.040014Z 0 [Note] WSREP: (de905d70, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2017-11-05T17:22:16.540329Z 0 [Note] WSREP: declaring d73a400b at tcp://172.31.5.172:4567 stable
2017-11-05T17:22:16.540836Z 0 [Warning] WSREP: no nodes coming from prim view, prim not possible
2017-11-05T17:22:16.540862Z 0 [Note] WSREP: Current view of cluster as seen by this node
view (view_id(NON_PRIM,d73a400b,2)
memb {
d73a400b,0
de905d70,0
}
joined {
}
left {
}
partitioned {
}
)
2017-11-05T17:22:19.039641Z 0 [Note] WSREP: (de905d70, 'tcp://0.0.0.0:4567') connection to peer de905d70 with addr tcp://172.31.13.81:4567 timed out, no messages seen in PT3S
2017-11-05T17:22:19.539775Z 0 [Note] WSREP: (de905d70, 'tcp://0.0.0.0:4567') turning message relay requesting off
2017-11-05T17:22:37.061048Z 0 [Note] WSREP: (de905d70, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://172.31.5.172:4567
2017-11-05T17:22:38.061306Z 0 [Note] WSREP: (de905d70, 'tcp://0.0.0.0:4567') reconnecting to d73a400b (tcp://172.31.5.172:4567), attempt 0
2017-11-05T17:22:41.562259Z 0 [Note] WSREP: evs::proto(de905d70, OPERATIONAL, view_id(REG,d73a400b,2)) suspecting node: d73a400b
2017-11-05T17:22:41.562325Z 0 [Note] WSREP: evs::proto(de905d70, OPERATIONAL, view_id(REG,d73a400b,2)) suspected node without join message, declaring inactive
2017-11-05T17:22:42.562445Z 0 [Note] WSREP: Current view of cluster as seen by this node
view (view_id(NON_PRIM,d73a400b,2)
memb {
de905d70,0
}
joined {
}
left {
}
partitioned {
d73a400b,0
}
)
2017-11-05T17:22:42.562525Z 0 [Warning] WSREP: no nodes coming from prim view, prim not possible
2017-11-05T17:22:42.562550Z 0 [Note] WSREP: Current view of cluster as seen by this node
view (view_id(NON_PRIM,de905d70,3)
memb {
de905d70,0
}
joined {
}
left {
}
partitioned {
d73a400b,0
}
)
2017-11-05T17:22:47.071058Z 0 [Note] WSREP: Current view of cluster as seen by this node
view ((empty))
2017-11-05T17:22:47.071334Z 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
at gcomm/src/pc.cpp:connect():158
2017-11-05T17:22:47.071356Z 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -110 (Connection timed out)
2017-11-05T17:22:47.071412Z 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1513: Failed to open channel 'pxc-cluster' at 'gcomm://172.31.5.172,172.31.13.81': -110 (Connection timed out)
2017-11-05T17:22:47.071421Z 0 [ERROR] WSREP: gcs connect failed: Connection timed out
2017-11-05T17:22:47.071426Z 0 [ERROR] WSREP: Provider/Node (gcomm://172.31.5.172,172.31.13.81) failed to establish connection with cluster (reason: 7)
2017-11-05T17:22:47.071430Z 0 [ERROR] Aborting
2017-11-05T17:22:47.071433Z 0 [Note] Giving 0 client threads a chance to die gracefully
2017-11-05T17:22:47.071437Z 0 [Note] WSREP: Waiting for active wsrep applier to exit
2017-11-05T17:22:47.071440Z 0 [Note] WSREP: Service disconnected.
2017-11-05T17:22:47.071443Z 0 [Note] WSREP: Waiting to close threads......
2017-11-05T17:22:52.072202Z 0 [Note] WSREP: Some threads may fail to exit.
2017-11-05T17:22:52.072285Z 0 [Note] Binlog end
2017-11-05T17:22:52.072905Z 0 [Note] mysqld: Shutdown complete
2017-11-05T17:22:52.322273Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2017-11-05T17:22:52.325253Z 0 [Warning] No argument was provided to --log-bin, and --log-bin-index was not used; so replication may break when this MySQL server acts as a master and has his hostname changed!! Please use '--log-bin=ip-172-31-13-81-bin' to avoid this problem.
2017-11-05T17:22:52.344397Z 0 [Warning] WSREP: Fail to access the file (/var/lib/mysql//gvwstate.dat) error (No such file or directory). It is possible if node is booting for first time or re-booting after a graceful shutdown
2017-11-05T17:22:52.346176Z 0 [Warning] WSREP: (f4348c81, 'tcp://0.0.0.0:4567') address 'tcp://172.31.13.81:4567' points to own listening address, blacklisting
2017-11-05T17:22:52.847728Z 0 [Warning] WSREP: no nodes coming from prim view, prim not possible
2017-11-05T17:23:22.856010Z 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
at gcomm/src/pc.cpp:connect():158
2017-11-05T17:23:22.856051Z 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -110 (Connection timed out)
2017-11-05T17:23:22.856128Z 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1513: Failed to open channel 'pxc-cluster' at 'gcomm://172.31.5.172,172.31.13.81': -110 (Connection timed out)
2017-11-05T17:23:22.856140Z 0 [ERROR] WSREP: gcs connect failed: Connection timed out
2017-11-05T17:23:22.856147Z 0 [ERROR] WSREP: Provider/Node (gcomm://172.31.5.172,172.31.13.81) failed to establish connection with cluster (reason: 7)
2017-11-05T17:23:22.856151Z 0 [ERROR] Aborting
Sim Rick está certo. Como todos os nós estão inativos, você precisa inicializar o cluster novamente.
Você parou todos os nós normalmente.
Se você sabe qual é o último (nó) parado, escolha este nó também bootstrap e inicie outro nó.
Mas se você não se lembrar do último nó que você desligou, abra grastate.dat (localize no diretório mysql) de todos os nós e compare o número seqno dentro e escolha um nó que tenha um número maior e inicie este nó primeiro. Se você escolher o nó errado para bootstrap, outro nó deverá remover todos os dados e solicitar novamente ao SST. por isso tem cuidado
Para saber em qual nó foi a última inicialização do cluster, supondo que foi digitado manualmente na linha de comando do Linux, você pode usar o comando linux $ history em TODOS os nós do cluster, e ver qual nó teve o comando mais recente no histórico: systemctl start [email protected]