我server_lifetime
在 PgBouncer 配置中找到了选项,我试图弄清楚我们的情况是否真的需要它。
https://pgbouncer.github.io/config.html
pooler 将尝试关闭连接时间超过此时间的服务器连接。将其设置为 0 意味着连接将只使用一次,然后关闭。[秒]
默认值:3600.0
关闭和重新打开连接是什么感觉?
为什么医生说“会尝试”?PgBouncer 可能会在此操作中失败吗?
我server_lifetime
在 PgBouncer 配置中找到了选项,我试图弄清楚我们的情况是否真的需要它。
https://pgbouncer.github.io/config.html
pooler 将尝试关闭连接时间超过此时间的服务器连接。将其设置为 0 意味着连接将只使用一次,然后关闭。[秒]
默认值:3600.0
关闭和重新打开连接是什么感觉?
为什么医生说“会尝试”?PgBouncer 可能会在此操作中失败吗?
我在我的 Django 1.4.8 项目中通过 PgBouncer 1.4.2 使用 PostgreSQL 9.1.6。django_session
表一直在增长。VACUUM 没有帮助。VACUUM FULL 运行数小时。TRUNCATE
通常只会有帮助。
该表包含大约 10KB 的行。每秒更新 40 次。
# SELECT table_name, pg_size_pretty(table_size) AS table_size, pg_size_pretty(indexes_size) AS indexes_size, pg_size_pretty(total_size) AS total_size
FROM (
SELECT table_name, pg_table_size(table_name) AS table_size, pg_indexes_size(table_name) AS indexes_size, pg_total_relation_size(table_name) AS total_size
FROM (
SELECT ('"' || table_schema || '"."' || table_name || '"') AS table_name
FROM information_schema.tables
) AS all_tables ORDER BY total_size DESC LIMIT 1
) AS pretty_sizes;
table_name | table_size | indexes_size | total_size
---------------------------+------------+--------------+------------
"public"."django_session" | 35 GB | 209 MB | 35 GB
(1 row)
# SELECT COUNT(*) FROM django_session;
count
-------
40196
cron 每晚都会执行一个清理命令。
./manage.py cleanup && echo "vacuum (analyze, verbose);" | ./manage.py dbshell
输出中有一些关于 django_session 的有趣内容。
INFO: vacuuming "public.django_session"
INFO: scanned index "django_session_pkey" to remove 5338 row versions
DETAIL: CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO: scanned index "django_session_expire_date" to remove 5338 row versions
DETAIL: CPU 0.10s/0.06u sec elapsed 3.47 sec.
INFO: "django_session": removed 5338 row versions in 187 pages
DETAIL: CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO: index "django_session_pkey" now contains 71568 row versions in 1647 pages
DETAIL: 0 index row versions were removed.
4 index pages have been deleted, 4 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: index "django_session_expire_date" now contains 71568 row versions in 25049 pages
DETAIL: 2785 index row versions were removed.
1804 index pages have been deleted, 1798 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: "django_session": found 0 removable, 71568 nonremovable row versions in 3386 out of 3386 pages
DETAIL: 46699 dead row versions cannot be removed yet.
There were 811150 unused item pointers.
0 pages are entirely empty.
CPU 0.11s/0.10u sec elapsed 16.00 sec.
...
INFO: analyzing "public.django_session"
INFO: "django_session": scanned 3460 of 3460 pages, containing 25514 live rows and 288939 dead rows; 25514 rows in sample, 25514 estimated total rows
我想知道46699 个死行版本、811150 个未使用的项目指针和288939 个死行。许多消息来源称问题在于“交易中的闲置”流程。我认为这是我的情况,因为有很多这样的过程:
# SELECT COUNT(*) FROM pg_stat_activity WHERE current_query = '<IDLE> in transaction';
count
-------
30
但是没有真正古老的查询。
# SELECT age(now(), query_start) AS "age" FROM pg_stat_activity ORDER BY query_start LIMIT 1;
age
-----------------
00:00:00.241521
大量的空闲进程可能是由 TransactionMiddleware 引起的。
所以现在我没有任何想法。我手动执行了清理命令,它运行良好,所以我决定问题以某种方式消失并截断了表。但它又长出来了。
# select * from pgstattuple('django_session');
-[ RECORD 1 ]------+---------
table_len | 23003136
tuple_count | 32139
tuple_len | 11201544
tuple_percent | 48.7
dead_tuple_count | 1729
dead_tuple_len | 171632
dead_tuple_percent | 0.75
free_space | 8930044
free_percent | 38.82
什么可能导致这种奇怪的问题?
更新:
我已删除 TransactionMiddleware 并在 PgBouncer 中切换到事务模式。现在几乎没有“交易中的空闲”流程。但这无济于事。
$ echo $(for i in `seq 100`; do echo "SELECT COUNT(*) FROM pg_stat_activity WHERE current_query = '<IDLE> in transaction';" | sudo -u postgres psql mydb | grep ' 0'; sleep 0.1; done | wc -l)"% of time there is NO idle in transaction processes"
75% of time there is NO idle in transaction processes