Estou tendo problemas para fazer o PostgreSQL voltar a funcionar no meu servidor CentOS 7.
Algum Contexto: Depois de perceber que eu estava rapidamente ficando sem espaço de armazenamento na partição /root do servidor, fiz algumas pesquisas rápidas e descobri que o culpado era que todos os arquivos em pg_xlog não estavam sendo excluídos. Como o /home tem muito espaço livre, optei por mover o diretório pg_xlog para um novo diretório em /home para ganhar tempo suficiente para resolver o problema inicial dos arquivos de log não serem limpos. Eu então fiz do pg_xlog original um link simbólico para o novo na partição /home seguindo alguns conselhos que encontrei aqui .
Mas isso resultou em alguns erros estranhos com o SELinux não permitindo que o Postgres lesse do novo diretório mesmo depois de chown-ing para postgres:postgres (tanto o link simbólico quanto o novo diretório).
Como não sou bem versado em SELinux, decidi apenas copiar os arquivos de log de volta para o diretório original e procurar estender fisicamente a partição.
Mas depois de copiar novamente todos os arquivos, o PostgreSQL não inicializa. As mensagens de erro não parecem ser muito úteis (pelo menos não para mim). Journalctl -xe está relatando:
Apr 19 00:12:29 localhost.localdomain systemd[1]: Unit postgresql.service entered failed state. Apr 19 00:12:29 localhost.localdomain systemd[1]: postgresql.service failed. Apr 19 00:12:29 localhost.localdomain polkitd[1105]: Unregistered >Authentication Agent for unix-process:17236:1124444 (system bus name :1.588, object path /org/freedesk Apr 19 00:12:31 localhost.localdomain abrt-server[17258]: Email address of sender was not specified. Would you like to do so now? If not, 'user@localhost' is to be used Apr 19 00:12:31 localhost.localdomain abrt-server[17258]: Email address of receiver was not specified. Would you like to do so now? If not, 'root@localhost' is to be us Apr 19 00:12:31 localhost.localdomain abrt-server[17258]: Sending an e-mail... Apr 19 00:12:31 localhost.localdomain abrt-server[17258]: Sending a notification email to: root@localhost Apr 19 00:12:31 localhost.localdomain abrt-server[17258]: Email was sent to: root@localhost Apr 19 00:12:31 localhost.localdomain postfix/pickup[10305]: AA0676622C: uid=0 from=<user@localhost> Apr 19 00:12:31 localhost.localdomain postfix/cleanup[17283]: AA0676622C: message-id=<5ad7b4bf.TazSMT+2CbgyTlql%user@localhost> Apr 19 00:12:31 localhost.localdomain postfix/qmgr[1963]: AA0676622C: from=<[email protected]>, size=45585, nrcpt=1 (queue active) Apr 19 00:12:31 localhost.localdomain postfix/local[17285]: AA0676622C: to=<[email protected]>, orig_to=<root@localhost>, relay=local, delay=0.05, delays=0.04/ Apr 19 00:12:31 localhost.localdomain postfix/qmgr[1963]: AA0676622C: removed
E systemctl status postgresql diz:
● postgresql.service - servidor de banco de dados PostgreSQL Carregado: carregado (/usr/lib/systemd/system/postgresql.service; ativado; predefinição do fornecedor: desativado) Ativo: falhou (Resultado: código de saída) desde qui 2018-04-19 00 :12:29 COMER; 1min 34s atrás Processo: 17252 ExecStart=/usr/bin/pg_ctl start -D ${PGDATA} -s -o -p ${PGPORT} -w -t 300 (code=exited, status=1/FAILURE) Processo: 17243 ExecStartPre=/usr/bin/postgresql-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)
19 de abril 00:12:28 localhost.localdomain systemd 1 : Iniciando o servidor de banco de dados PostgreSQL... 19 de abril 00:12:29 localhost.localdomain systemd 1 : postgresql.service: processo de controle encerrado, código=exited status=1 Abr 19 00 :12:29 localhost.localdomain systemd 1 : Falha ao iniciar o servidor de banco de dados PostgreSQL. 19 de abril 00:12:29 localhost.localdomain systemd 1 : Unidade postgresql.service entrou em estado de falha. 19 de abril 00:12:29 localhost.localdomain systemd 1 : postgresql.service falhou.
Devo confessar que estou sem juízo. Qualquer ajuda, mesmo que seja apenas uma maneira de gerar mensagens de erro mais úteis, seria apreciada.
EDITAR:
Parece que durante a cópia de volta dos arquivos pg_xlog a propriedade dos arquivos mudou para root. Desde então, copiei novamente esses mesmos arquivos com rsync para preservar as permissões. E agora estou incluindo os logs do PostgreSQL conforme recomendado na seção de comentários abaixo. A mensagem de erro nos logs do PostgreSQL é:
WARNING: transaction log file "00000001000000470000008D" could not be archived: too many failures LOG: archive command failed with exit code 1 DETAIL: The failed archive command was: false LOG: archive command failed with exit code 1 DETAIL: The failed archive command was: false LOG: archive command failed with exit code 1 DETAIL: The failed archive command was: false WARNING: transaction log file "00000001000000470000008D" could not be archived: too many failures LOG: database system was interrupted; last known up at 2018-04-18 19:21:53 EAT PANIC: could not open file "pg_xlog/000000010000006900000017" (log file 105, segment 23): Permission denied LOG: startup process (PID 17256) was terminated by signal 6: Aborted LOG: aborting startup due to startup process failure LOG: database system was interrupted; last known up at 2018-04-18 19:21:53 EAT PANIC: could not open file "pg_xlog/000000010000006900000017" (log file 105, segment 23): Permission denied LOG: startup process (PID 20020) was terminated by signal 6: Aborted LOG: aborting startup due to startup process failure LOG: database system was interrupted; last known up at 2018-04-18 19:21:53 EAT FATAL: the database system is starting up PANIC: could not open file "pg_xlog/000000010000006900000017" (log file 105, segment 23): Permission denied LOG: startup process (PID 25607) was terminated by signal 6: Aborted LOG: aborting startup due to startup process failure LOG: database system was interrupted; last known up at 2018-04-18 19:21:53 EAT FATAL: the database system is starting up FATAL: the database system is starting up LOG: invalid magic number 0000 in log file 105, segment 23, offset 9617408 LOG: invalid primary checkpoint record LOG: invalid magic number 0000 in log file 105, segment 23, offset 9601024 LOG: invalid secondary checkpoint record PANIC: could not locate a valid checkpoint record FATAL: the database system is starting up LOG: startup process (PID 28108) was terminated by signal 6: Aborted LOG: aborting startup due to startup process failure LOG: database system was interrupted; last known up at 2018-04-18 19:21:53 EAT LOG: invalid magic number 0000 in log file 105, segment 23, offset 9617408 LOG: invalid primary checkpoint record LOG: invalid magic number 0000 in log file 105, segment 23, offset 9601024 LOG: invalid secondary checkpoint record PANIC: could not locate a valid checkpoint record LOG: startup process (PID 28529) was terminated by signal 6: Aborted LOG: aborting startup due to startup process failure