Configurei a replicação master slave no PostgreSQL, seguindo este tutorial:
http://wiki.postgresql.org/wiki/Binary_Replication_Tutorial#Binary_Replication_in_6_Steps
Funcionou bem. Agora promovi o escravo e configurei o mestre original para ser um escravo do novo mestre, novamente seguindo o tutorial no link acima.
No entanto, a replicação não funciona mais. O novo escravo (ou seja, o antigo mestre) registra erros como:
FATAL: terminating walreceiver process due to administrator command
LOG: out-of-sequence timeline ID 1 (after 2) in log file 0, segment 1, offset 0
ou como:
LOG: unexpected timeline ID 1 in log file 0, segment 1, offset 0
O que estou fazendo errado?
Detalhes a seguir
Quando promovi o ex-escravo, parece que começou uma nova linha do tempo, nº 2:
LOG: received promote request
FATAL: terminating walreceiver process due to administrator command
LOG: record with zero length at 0/18B5148
LOG: redo done at 0/18B50F0
LOG: last completed transaction was at log time 2011-12-05 08:39:43.872041+00
LOG: selected new timeline ID: 2
LOG: archive recovery complete
LOG: database system is ready to accept connections
Existem 2 linhas de tempo no pg_xlog
diretório do novo mestre (ou seja, o antigo escravo):
new-master$ tree pg_xlog
pg_xlog
├── 000000010000000000000001
├── 000000020000000000000001
├── 00000002.history
└── archive_status
└── 00000002.history.ready
Desligo o novo master e o novo slave. No novo escravo, eu faço isso:
new-slave$ cd /var/lib/pgsql/9.1/data/
new-slave$ rm *
new-slave$ mkdir pg_xlog ; chmod 700 pg_xlog
new-slave$ rsync -a --exclude pg_xlog --exclude postgresql.conf --exclude recovery.conf --exclude recovery.done --exclude postmaster.pid --exclude 'server.*' \
dw0azewdbpv11danny:/var/lib/pgsql/9.1/data/* .
new-slave$ ... restore postgresql.conf etcetera from backup,
... change settings following instructions in tutorial
Eu começo o novo mestre e o novo escravo. Mas o escravo diz FATAL: encerrando o processo walreceiver :
LOG: database system was interrupted; last known up at 2011-12-06 21:49:49 UTC
LOG: creating missing WAL directory "pg_xlog/archive_status"
LOG: entering standby mode
LOG: streaming replication successfully connected to primary
LOG: unexpected timeline ID 1 in log file 0, segment 1, offset 0
FATAL: terminating walreceiver process due to administrator command
LOG: unexpected timeline ID 1 in log file 0, segment 1, offset 0
LOG: unexpected timeline ID 1 in log file 0, segment 1, offset 0
...
O pg_xlog no novo escravo:
new-slave$ tree pg_xlog/
pg_xlog/
├── 000000020000000000000001
└── archive_status
Eu deveria ter rsync pg_xlog
também? Eu fiz isso: desligar mestre e escravo, rsync pg_xlog, iniciar escravo, iniciar mestre. Então a replicação parece funcionar: o novo escravo diz:
LOG: database system was shut down at 2011-12-06 22:10:14 UTC
LOG: entering standby mode
LOG: consistent recovery state reached at 0/19A4420
LOG: database system is ready to accept read only connections
LOG: record with zero length at 0/19A4420
FATAL: could not connect to the primary server: could not connect to server: Connection refused
Is the server running on host "dw0azewdbpv11danny" (46.137.XX.YY) and accepting
TCP/IP connections on port 5432?
LOG: streaming replication successfully connected to primary
Mas quando faço qualquer coisa no novo mestre:
insert into moo (mää) values (23);
Em seguida, a replicação falha. O novo escravo diz:
LOG: could not receive data from client: Connection reset by peer
LOG: out-of-sequence timeline ID 1 (after 2) in log file 0, segment 1, offset 0
FATAL: terminating walreceiver process due to administrator command
LOG: out-of-sequence timeline ID 1 (after 2) in log file 0, segment 1, offset 0
LOG: out-of-sequence timeline ID 1 (after 2) in log file 0, segment 1, offset 0
O mestre diz:
LOG: database system is ready to accept connections
LOG: autovacuum launcher started
LOG: could not receive data from client: Connection reset by peer
Mas eu não reiniciei nenhuma conexão!
Se eu reiniciar o escravo, ele falha instantaneamente: FATAL: encerrando o processo do walreceiver :
LOG: database system was shut down in recovery at 2011-12-06 22:24:37 UTC
LOG: entering standby mode
LOG: consistent recovery state reached at 0/19A4420
LOG: database system is ready to accept read only connections
LOG: redo starts at 0/19A4420
LOG: record with zero length at 0/19A4510
LOG: streaming replication successfully connected to primary
LOG: out-of-sequence timeline ID 1 (after 2) in log file 0, segment 1, offset 0
FATAL: terminating walreceiver process due to administrator command
Arquivos de configuração:
---- new master & slave: ----
wal_level = hot_standby
#archive_mode = off (default)
---- new master: ----
max_wal_senders = 5
wal_keep_segments = 100
---- new slave: ----
hot_standby = on
---- recovery.conf, on the new slave: ----
standby_mode = 'on'
primary_conninfo = 'host=dw0azewdbpv11danny user=... password=...'
O que estou fazendo errado? Como posso transformar o antigo mestre em escravo?
(É melhor se eu não deletar todos os arquivos no novo escravo antes de mim rsync
?
Ou devo executar initdb
no novo escravo antes de mim rsync
?
Você sabe se existe algum tutorial sobre como converter um mestre em um escravo? )
Atenciosamente, KajMagnus
Achei uma solução:
Adicione
recovery_target_timeline = 'latest'
a recovery.conf no novo mestre (o antigo escravo).
Agora a replicação funciona bem: o novo mestre escolhe a linha do tempo correta.
Aqui estão algumas informações sobre esse parâmetro e muitos outros parâmetros em recovery.conf:
http://pgpool.projects.postgresql.org/pgpool-II/doc/recovery.conf.sample
Aqui estão mais algumas informações sobre esse parâmetro:
http://www.postgresql.org/docs/9.1/static/warm-standby.html#STANDBY-SERVER-SETUP