我在我的 Django 1.4.8 项目中通过 PgBouncer 1.4.2 使用 PostgreSQL 9.1.6。django_session
表一直在增长。VACUUM 没有帮助。VACUUM FULL 运行数小时。TRUNCATE
通常只会有帮助。
该表包含大约 10KB 的行。每秒更新 40 次。
# SELECT table_name, pg_size_pretty(table_size) AS table_size, pg_size_pretty(indexes_size) AS indexes_size, pg_size_pretty(total_size) AS total_size
FROM (
SELECT table_name, pg_table_size(table_name) AS table_size, pg_indexes_size(table_name) AS indexes_size, pg_total_relation_size(table_name) AS total_size
FROM (
SELECT ('"' || table_schema || '"."' || table_name || '"') AS table_name
FROM information_schema.tables
) AS all_tables ORDER BY total_size DESC LIMIT 1
) AS pretty_sizes;
table_name | table_size | indexes_size | total_size
---------------------------+------------+--------------+------------
"public"."django_session" | 35 GB | 209 MB | 35 GB
(1 row)
# SELECT COUNT(*) FROM django_session;
count
-------
40196
cron 每晚都会执行一个清理命令。
./manage.py cleanup && echo "vacuum (analyze, verbose);" | ./manage.py dbshell
输出中有一些关于 django_session 的有趣内容。
INFO: vacuuming "public.django_session"
INFO: scanned index "django_session_pkey" to remove 5338 row versions
DETAIL: CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO: scanned index "django_session_expire_date" to remove 5338 row versions
DETAIL: CPU 0.10s/0.06u sec elapsed 3.47 sec.
INFO: "django_session": removed 5338 row versions in 187 pages
DETAIL: CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO: index "django_session_pkey" now contains 71568 row versions in 1647 pages
DETAIL: 0 index row versions were removed.
4 index pages have been deleted, 4 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: index "django_session_expire_date" now contains 71568 row versions in 25049 pages
DETAIL: 2785 index row versions were removed.
1804 index pages have been deleted, 1798 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: "django_session": found 0 removable, 71568 nonremovable row versions in 3386 out of 3386 pages
DETAIL: 46699 dead row versions cannot be removed yet.
There were 811150 unused item pointers.
0 pages are entirely empty.
CPU 0.11s/0.10u sec elapsed 16.00 sec.
...
INFO: analyzing "public.django_session"
INFO: "django_session": scanned 3460 of 3460 pages, containing 25514 live rows and 288939 dead rows; 25514 rows in sample, 25514 estimated total rows
我想知道46699 个死行版本、811150 个未使用的项目指针和288939 个死行。许多消息来源称问题在于“交易中的闲置”流程。我认为这是我的情况,因为有很多这样的过程:
# SELECT COUNT(*) FROM pg_stat_activity WHERE current_query = '<IDLE> in transaction';
count
-------
30
但是没有真正古老的查询。
# SELECT age(now(), query_start) AS "age" FROM pg_stat_activity ORDER BY query_start LIMIT 1;
age
-----------------
00:00:00.241521
大量的空闲进程可能是由 TransactionMiddleware 引起的。
所以现在我没有任何想法。我手动执行了清理命令,它运行良好,所以我决定问题以某种方式消失并截断了表。但它又长出来了。
# select * from pgstattuple('django_session');
-[ RECORD 1 ]------+---------
table_len | 23003136
tuple_count | 32139
tuple_len | 11201544
tuple_percent | 48.7
dead_tuple_count | 1729
dead_tuple_len | 171632
dead_tuple_percent | 0.75
free_space | 8930044
free_percent | 38.82
什么可能导致这种奇怪的问题?
更新:
我已删除 TransactionMiddleware 并在 PgBouncer 中切换到事务模式。现在几乎没有“交易中的空闲”流程。但这无济于事。
$ echo $(for i in `seq 100`; do echo "SELECT COUNT(*) FROM pg_stat_activity WHERE current_query = '<IDLE> in transaction';" | sudo -u postgres psql mydb | grep ' 0'; sleep 0.1; done | wc -l)"% of time there is NO idle in transaction processes"
75% of time there is NO idle in transaction processes
第一的!升级到 9.1.10(只需安装新的二进制文件并重新启动数据库)。
我会做以下事情:
autovacuum
设置,因此我会确保autovacuum
正在运行;django_session
;CLUSTER
该表,使用主键索引。pg_stat_%
检查这个特定表的视图内容也很有趣。从 PK 索引中大量的死条目来看,我有一种感觉,Django 正在构建动态UPDATE
查询,它会更新元组的所有列。如果这是真的,那么它解释了你看到的情况。好的,这是我经过大量研究后的答案。
不要在 KVM 等虚拟机上运行大量加载的 PostgreSQL 服务器。即使有适当的驱动程序直接访问硬盘,它也缺乏 i/o 速度。