AskOverflow.Dev

AskOverflow.Dev Logo AskOverflow.Dev Logo

AskOverflow.Dev Navigation

  • 主页
  • 系统&网络
  • Ubuntu
  • Unix
  • DBA
  • Computer
  • Coding
  • LangChain

Mobile menu

Close
  • 主页
  • 系统&网络
    • 最新
    • 热门
    • 标签
  • Ubuntu
    • 最新
    • 热门
    • 标签
  • Unix
    • 最新
    • 标签
  • DBA
    • 最新
    • 标签
  • Computer
    • 最新
    • 标签
  • Coding
    • 最新
    • 标签
主页 / dba / 问题 / 267187
Accepted
Hassan Baig
Hassan Baig
Asked: 2020-05-15 18:32:47 +0800 CST2020-05-15 18:32:47 +0800 CST 2020-05-15 18:32:47 +0800 CST

解释 postgresql 部署中反复出现的崩溃

  • 772

我在中等繁忙的生产服务器上部署 postgresql 时遇到意外的系统崩溃问题(Django 1.8 应用程序部署在带有gunicorn应用程序服务器和nginx反向代理的 DO droplet 上)。

我是一名偶然的 DBA,需要专家的意见来拼凑正在发生的事情。我现在不知道;我将在下面提供各种日志数据:

整个应用程序崩溃 - new relic 最终显示以下错误:

django.db.utils:OperationalError: ERROR: no more connections allowed (max_client_conn)

完整的堆栈跟踪是:

File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/gevent/baseserver.py", line 26, in _handle_and_close_when_done
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/gunicorn/workers/ggevent.py", line 155, in handle
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/gunicorn/workers/base_async.py", line 56, in handle
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/gunicorn/workers/ggevent.py", line 160, in handle_request
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/gunicorn/workers/base_async.py", line 114, in handle_request
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/newrelic-2.56.0.42/newrelic/api/web_transaction.py", line 704, in __iter__
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/newrelic-2.56.0.42/newrelic/api/web_transaction.py", line 1080, in __call__
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/core/handlers/wsgi.py", line 189, in __call__
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/core/handlers/base.py", line 108, in get_response
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/newrelic-2.56.0.42/newrelic/hooks/framework_django.py", line 228, in wrapper
File "/home/ubuntu/app/mproject/middleware/mycustommiddleware.py", line 6, in process_request
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/utils/functional.py", line 225, in inner
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/utils/functional.py", line 376, in _setup
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/contrib/auth/middleware.py", line 22, in <lambda>
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/contrib/auth/middleware.py", line 10, in get_user
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/contrib/auth/__init__.py", line 174, in get_user
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/contrib/auth/backends.py", line 93, in get_user
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/db/models/manager.py", line 127, in manager_method
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/db/models/query.py", line 328, in get
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/db/models/query.py", line 144, in __len__
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/db/models/query.py", line 965, in _fetch_all
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/db/models/query.py", line 238, in iterator
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 838, in execute_sql
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/db/backends/base/base.py", line 164, in cursor
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/db/backends/base/base.py", line 135, in _cursor
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/db/backends/base/base.py", line 130, in ensure_connection
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/db/utils.py", line 98, in __exit__
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/db/backends/base/base.py", line 130, in ensure_connection
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/db/backends/base/base.py", line 119, in connect
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/db/backends/postgresql_psycopg2/base.py", line 176, in get_new_connection
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/newrelic-2.56.0.42/newrelic/hooks/database_dbapi2.py", line 102, in __call__
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/psycopg2/__init__.py", line 126, in connect
File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/psycogreen/gevent.py", line 32, in gevent_wait_callback

新遗物也显示出巨大的峰值Postgres set:

在此处输入图像描述

Postgresql 慢日志显示过高的 COMMIT 持续时间,如下所示:

2020-05-15 01:22:47.474 UTC [10722] ubuntu@myapp LOG:  duration: 280668.902 ms  statement: COMMIT
2020-05-15 01:22:47.474 UTC [11536] ubuntu@myapp LOG:  duration: 49251.277 ms  statement: COMMIT
2020-05-15 01:22:47.474 UTC [10719] ubuntu@myapp LOG:  duration: 281609.394 ms  statement: COMMIT
2020-05-15 01:22:47.474 UTC [10719] ubuntu@myapp LOG:  could not send data to client: Broken pipe
2020-05-15 01:22:47.474 UTC [10719] ubuntu@myapp FATAL:  connection to client lost
2020-05-15 01:22:47.475 UTC [10672] ubuntu@myapp LOG:  duration: 285371.872 ms  statement: COMMIT

Digital Ocean 的仪表板显示 和 的峰值Load Average(Disk IO但CPU,memory usage和disk usage保持低位)。

在此处输入图像描述

该Load Average峰值时的快照统计数据是:

在此处输入图像描述

syslog有以下错误日志的无数实例:

May 15 01:22:46 main-app gunicorn[10779]: error: [Errno 111] Connection refused
May 15 01:22:46 main-app gunicorn[10779]: [2020-05-15 01:22:46 +0000] [10874] [ERROR] Socket error processing request.
May 15 01:22:46 main-app gunicorn[10779]: Traceback (most recent call last):
May 15 01:22:46 main-app gunicorn[10779]:   File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/gunicorn/workers/base_async.py", line 66, in handle
May 15 01:22:46 main-app gunicorn[10779]:     six.reraise(*sys.exc_info())
May 15 01:22:46 main-app gunicorn[10779]:   File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/gunicorn/workers/base_async.py", line 56, in handle
May 15 01:22:46 main-app gunicorn[10779]:     self.handle_request(listener_name, req, client, addr)
May 15 01:22:46 main-app gunicorn[10779]:   File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/gunicorn/workers/ggevent.py", line 160, in handle_request
May 15 01:22:46 main-app gunicorn[10779]:     addr)
May 15 01:22:46 main-app gunicorn[10779]:   File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/gunicorn/workers/base_async.py", line 129, in handle_request
May 15 01:22:46 main-app gunicorn[10779]:     six.reraise(*sys.exc_info())
May 15 01:22:46 main-app gunicorn[10779]:   File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/gunicorn/workers/base_async.py", line 114, in handle_request
May 15 01:22:46 main-app gunicorn[10779]:     for item in respiter:
May 15 01:22:46 main-app gunicorn[10779]:   File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/newrelic-2.56.0.42/newrelic/api/web_transaction.py", line 704, in __iter__
May 15 01:22:46 main-app gunicorn[10779]:     for item in self.generator:
May 15 01:22:46 main-app gunicorn[10779]:   File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/newrelic-2.56.0.42/newrelic/api/web_transaction.py", line 1080, in __call__
May 15 01:22:46 main-app gunicorn[10779]:     self.start_response)
May 15 01:22:46 main-app gunicorn[10779]:   File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/core/handlers/wsgi.py", line 189, in __call__
May 15 01:22:46 main-app gunicorn[10779]:     response = self.get_response(request)
May 15 01:22:46 main-app gunicorn[10779]:   File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/core/handlers/base.py", line 218, in get_response
May 15 01:22:46 main-app gunicorn[10779]:     response = self.handle_uncaught_exception(request, resolver, sys.exc_info())
May 15 01:22:46 main-app gunicorn[10779]:   File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/newrelic-2.56.0.42/newrelic/hooks/framework_django.py", line 448, in wrapper
May 15 01:22:46 main-app gunicorn[10779]:     return _wrapped(*args, **kwargs)
May 15 01:22:46 main-app gunicorn[10779]:   File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/newrelic-2.56.0.42/newrelic/hooks/framework_django.py", line 441, in _wrapped
May 15 01:22:46 main-app gunicorn[10779]:     return wrapped(request, resolver, exc_info)
May 15 01:22:46 main-app gunicorn[10779]:   File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/core/handlers/base.py", line 256, in handle_uncaught_exception
May 15 01:22:46 main-app gunicorn[10779]:     'request': request
May 15 01:22:46 main-app gunicorn[10779]:   File "/usr/lib/python2.7/logging/__init__.py", line 1193, in error
May 15 01:22:46 main-app gunicorn[10779]:     self._log(ERROR, msg, args, **kwargs)
May 15 01:22:46 main-app gunicorn[10779]:   File "/usr/lib/python2.7/logging/__init__.py", line 1286, in _log
May 15 01:22:46 main-app gunicorn[10779]:     self.handle(record)
May 15 01:22:46 main-app gunicorn[10779]:   File "/usr/lib/python2.7/logging/__init__.py", line 1296, in handle
May 15 01:22:46 main-app gunicorn[10779]:     self.callHandlers(record)
May 15 01:22:46 main-app gunicorn[10779]:   File "/usr/lib/python2.7/logging/__init__.py", line 1336, in callHandlers
May 15 01:22:46 main-app gunicorn[10779]:     hdlr.handle(record)
May 15 01:22:46 main-app gunicorn[10779]:   File "/usr/lib/python2.7/logging/__init__.py", line 759, in handle
May 15 01:22:46 main-app gunicorn[10779]:     self.emit(record)
May 15 01:22:46 main-app gunicorn[10779]:   File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/utils/log.py", line 129, in emit
May 15 01:22:46 main-app gunicorn[10779]:     self.send_mail(subject, message, fail_silently=True, html_message=html_message)
May 15 01:22:46 main-app gunicorn[10779]:   File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/utils/log.py", line 132, in send_mail
May 15 01:22:46 main-app gunicorn[10779]:     mail.mail_admins(subject, message, *args, connection=self.connection(), **kwargs)
May 15 01:22:46 main-app gunicorn[10779]:   File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/newrelic-2.56.0.42/newrelic/api/function_trace.py", line 110, in literal_wrapper
May 15 01:22:46 main-app gunicorn[10779]:     return wrapped(*args, **kwargs)
May 15 01:22:46 main-app gunicorn[10779]:   File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/core/mail/__init__.py", line 98, in mail_admins
May 15 01:22:46 main-app gunicorn[10779]:     mail.send(fail_silently=fail_silently)
May 15 01:22:46 main-app gunicorn[10779]:   File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/newrelic-2.56.0.42/newrelic/api/function_trace.py", line 110, in literal_wrapper
May 15 01:22:46 main-app gunicorn[10779]:     return wrapped(*args, **kwargs)
May 15 01:22:46 main-app gunicorn[10779]:   File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/core/mail/message.py", line 303, in send
May 15 01:22:46 main-app gunicorn[10779]:     return self.get_connection(fail_silently).send_messages([self])
May 15 01:22:46 main-app gunicorn[10779]:   File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/core/mail/backends/smtp.py", line 100, in send_messages
May 15 01:22:46 main-app gunicorn[10779]:     new_conn_created = self.open()
May 15 01:22:46 main-app gunicorn[10779]:   File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/django/core/mail/backends/smtp.py", line 58, in open
May 15 01:22:46 main-app gunicorn[10779]:     self.connection = connection_class(self.host, self.port, **connection_params)
May 15 01:22:46 main-app gunicorn[10779]:   File "/usr/lib/python2.7/smtplib.py", line 256, in __init__
May 15 01:22:46 main-app gunicorn[10779]:     (code, msg) = self.connect(host, port)
May 15 01:22:46 main-app gunicorn[10779]:   File "/usr/lib/python2.7/smtplib.py", line 316, in connect
May 15 01:22:46 main-app gunicorn[10779]:     self.sock = self._get_socket(host, port, self.timeout)
May 15 01:22:46 main-app gunicorn[10779]:   File "/usr/lib/python2.7/smtplib.py", line 291, in _get_socket
May 15 01:22:46 main-app gunicorn[10779]:     return socket.create_connection((host, port), timeout)
May 15 01:22:46 main-app gunicorn[10779]:   File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/gevent/socket.py", line 96, in create_connection
May 15 01:22:46 main-app gunicorn[10779]:     sock.connect(sa)
May 15 01:22:46 main-app gunicorn[10779]:   File "/home/ubuntu/.virtualenvs/app/local/lib/python2.7/site-packages/gevent/_socket2.py", line 244, in connect
May 15 01:22:46 main-app gunicorn[10779]:     raise error(err, strerror(err))

一些行来自vmtstat 1:

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0      0 4698028 280600 16692992    0    0     0    14    0    0 13  1 87  0  0
 3  0      0 4697204 280600 16693016    0    0     0    24 8294 8111 12  0 88  0  0
 3  0      0 4698148 280600 16693044    0    0     0   204 8533 9047 15  0 85  0  0
 4  0      0 4697628 280600 16693092    0    0     0    76 8695 8775 17  1 83  0  0
 3  0      0 4697288 280600 16693140    0    0     8    16 9578 9720 19  1 80  0  0
 3  0      0 4695768 280600 16693200    0    0     0   152 8594 8073 16  1 83  0  0
 6  0      0 4686044 280600 16693272    0    0     0    32 7998 7706 12  1 87  0  0
 5  0      0 4681876 280600 16693308    0    0     0   224 9130 8810 14  1 85  0  0

这就是我所拥有的。我根本无法拼凑出可能发生的事情。现在几乎每天发生一次。有专家能解释一下吗?提前致谢!


注意:在必要的情况下,SELECT version();产量:

PostgreSQL 9.6.16 on x86_64-pc-linux-gnu (Ubuntu 9.6.16-1.pgdg16.04+1), compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609, 64-bit
postgresql postgresql-9.6
  • 2 2 个回答
  • 321 Views

2 个回答

  • Voted
  1. Best Answer
    jjanes
    2020-05-16T14:30:34+08:002020-05-16T14:30:34+08:00

    4.6 分钟的 COMMIT 几乎可以确定您的 IO 系统要么严重不足,要么完全被淹没。这将解释所有其他问题。如果连接不能提交,它们通常不会被关闭,所以它们会建立起来。最终你会得到“错误:不允许更多的连接”。如果客户端放弃等待提交响应并断开连接,您可能会在服务器上收到“致命:与客户端的连接丢失”。请注意,这里没有 PostgreSQL“崩溃”,它仍在运行并(缓慢地)服务连接,直到超过 max_connections,此时它仍在为现有连接提供服务。

    COMMIT 会执行 fsync 以外的其他事情,偶尔但很少有这些事情会导致问题。因此,为了更加明确,您可以对 pg_stat_activity 进行采样,并查看在这些情节中提交的 wait_event 和 wait_event_type 是什么。如果 wait_event 是 'WALWriteLock' 或 'WALSync' 那是非常确定的。

    如果您为服务级别>“沙鼠”的存储系统付费,我会说这完全落在了 Digital Ocean 的手中。

    • 3
  2. Laurenz Albe
    2020-05-15T21:44:19+08:002020-05-15T21:44:19+08:00

    较长的提交时间表明数据库服务器上的 I/O 严重过载。机器似乎陷入了爬行,因此它无法及时响应连接请求。

    检查数据库工作负载并查看导致过载的原因。

    • 2

相关问题

  • 我可以在使用数据库后激活 PITR 吗?

  • 运行时间偏移延迟复制的最佳实践

  • 存储过程可以防止 SQL 注入吗?

  • PostgreSQL 中 UniProt 的生物序列

  • PostgreSQL 9.0 Replication 和 Slony-I 有什么区别?

Sidebar

Stats

  • 问题 205573
  • 回答 270741
  • 最佳答案 135370
  • 用户 68524
  • 热门
  • 回答
  • Marko Smith

    连接到 PostgreSQL 服务器:致命:主机没有 pg_hba.conf 条目

    • 12 个回答
  • Marko Smith

    如何让sqlplus的输出出现在一行中?

    • 3 个回答
  • Marko Smith

    选择具有最大日期或最晚日期的日期

    • 3 个回答
  • Marko Smith

    如何列出 PostgreSQL 中的所有模式?

    • 4 个回答
  • Marko Smith

    列出指定表的所有列

    • 5 个回答
  • Marko Smith

    如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

    • 4 个回答
  • Marko Smith

    你如何mysqldump特定的表?

    • 4 个回答
  • Marko Smith

    使用 psql 列出数据库权限

    • 10 个回答
  • Marko Smith

    如何从 PostgreSQL 中的选择查询中将值插入表中?

    • 4 个回答
  • Marko Smith

    如何使用 psql 列出所有数据库和表?

    • 7 个回答
  • Martin Hope
    Jin 连接到 PostgreSQL 服务器:致命:主机没有 pg_hba.conf 条目 2014-12-02 02:54:58 +0800 CST
  • Martin Hope
    Stéphane 如何列出 PostgreSQL 中的所有模式? 2013-04-16 11:19:16 +0800 CST
  • Martin Hope
    Mike Walsh 为什么事务日志不断增长或空间不足? 2012-12-05 18:11:22 +0800 CST
  • Martin Hope
    Stephane Rolland 列出指定表的所有列 2012-08-14 04:44:44 +0800 CST
  • Martin Hope
    haxney MySQL 能否合理地对数十亿行执行查询? 2012-07-03 11:36:13 +0800 CST
  • Martin Hope
    qazwsx 如何监控大型 .sql 文件的导入进度? 2012-05-03 08:54:41 +0800 CST
  • Martin Hope
    markdorison 你如何mysqldump特定的表? 2011-12-17 12:39:37 +0800 CST
  • Martin Hope
    Jonas 如何使用 psql 对 SQL 查询进行计时? 2011-06-04 02:22:54 +0800 CST
  • Martin Hope
    Jonas 如何从 PostgreSQL 中的选择查询中将值插入表中? 2011-05-28 00:33:05 +0800 CST
  • Martin Hope
    Jonas 如何使用 psql 列出所有数据库和表? 2011-02-18 00:45:49 +0800 CST

热门标签

sql-server mysql postgresql sql-server-2014 sql-server-2016 oracle sql-server-2008 database-design query-performance sql-server-2017

Explore

  • 主页
  • 问题
    • 最新
    • 热门
  • 标签
  • 帮助

Footer

AskOverflow.Dev

关于我们

  • 关于我们
  • 联系我们

Legal Stuff

  • Privacy Policy

Language

  • Pt
  • Server
  • Unix

© 2023 AskOverflow.DEV All Rights Reserve