我正在使用 PostgreSQL 数据库,其中需要存储与特定键相关联的数值。
随着时间的推移,我会根据密钥不断添加这个值。
我想确保表不会因多个行版本或死元组而变得臃肿,特别是因为这个更新操作会很频繁(比如 100 req/s)。
PostgreSQL 中实现此目的的最佳实践是什么?
我应该使用 INSERT ON CONFLICT、触发器还是其他方法?
我如何确保我的表保持高效并且不会因频繁更新而出现过度膨胀?
我正在使用 PostgreSQL 数据库,其中需要存储与特定键相关联的数值。
随着时间的推移,我会根据密钥不断添加这个值。
我想确保表不会因多个行版本或死元组而变得臃肿,特别是因为这个更新操作会很频繁(比如 100 req/s)。
PostgreSQL 中实现此目的的最佳实践是什么?
我应该使用 INSERT ON CONFLICT、触发器还是其他方法?
我如何确保我的表保持高效并且不会因频繁更新而出现过度膨胀?
我有两个具有以下规格的服务器:
主 Postgres 13.3 数据库 (db1) 安装在第一台服务器 (Ubuntu 16.04.7) 上,配置如下:
shared_buffers = 16GB
work_mem = 128MB
maintenance_work_mem = 8GB
effective_cache_size = 16GB
effective_io_concurrency = 400
max_worker_processes = 8
max_parallel_workers_per_gather = 4
max_parallel_workers = 8
wal_level = logical
synchronous_commit = on
max_wal_size = 4GB
min_wal_size = 32MB
wal_keep_size = 16384
wal_sender_timeout = 60s
checkpoint_completion_target = 0.7
synchronous_standby_names = 'FIRST 1 (db2_slave)'
max_standby_archive_delay = 1800s
max_standby_streaming_delay = 1800s
备用数据库是安装在第二台服务器(Ubuntu 20.04.3)上的 Postgres 13.4 数据库(db2),配置如下:
shared_buffers = 24GB
work_mem = 128MB
maintenance_work_mem = 16GB
effective_cache_size = 24GB
effective_io_concurrency = 400
max_worker_processes = 8
max_parallel_workers_per_gather = 4
max_parallel_workers = 8
wal_level = logical
synchronous_commit = on
max_wal_size = 4GB
min_wal_size = 32MB
checkpoint_completion_target = 0.7
primary_conninfo = 'host=... port=5432 user=repluser passfile=''...'' application_name=db2_slave'
primary_slot_name = 'db2'
hot_standby = on
max_standby_archive_delay = 1800s
max_standby_streaming_delay = 1800s
如果我在备用服务器上运行 iotop -u postgresql ,我会看到两个进程:
2229172 postgres: 13/main: walreceiver streaming DDFD/8E9FE9E0
2229138 postgres: 13/main: startup recovering 000000010000DDFD0000008E
在我在待机(SELECT COUNT(*) FROM big_table;
2229138 postgres: 13/main: startup recovering 000000010000DE0400000017 waiting
我在master上运行了这个查询:
SELECT client_addr as client,
usename as user,
application_name as name,
state,
sync_state as mode,
pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), sent_lsn)) as pending,
pg_size_pretty(pg_wal_lsn_diff(sent_lsn, write_lsn)) as write,
pg_size_pretty(pg_wal_lsn_diff(write_lsn, flush_lsn)) as flush,
pg_size_pretty(pg_wal_lsn_diff(flush_lsn, replay_lsn)) as replay,
pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn)) as total_lag
FROM pg_stat_replication;
输出是:
client | user | name | state | mode | pending | write | flush | replay | total_lag
-------------+----------+-----------+-----------+------+---------+---------+---------+--------+-----------
... | repluser | db2_slave | streaming | sync | 0 bytes | 0 bytes | 0 bytes | 21 MB | 21 MB
(1 row)
如果我多次执行此请求,则在执行此查询 ( SELECT COUNT(*) FROM big_table
) 期间,重播和总延迟会一直增加。因此,我想知道问题的答案:
我有一个代表分类帐簿的表格分类帐。
date | user_id | usd_value | eur_value
----------------------------+---------+-----------+-----------
2020-01-13 19:00:10.877+03 | 1 | 10 | 0
2020-01-13 19:10:15.4+03 | 1 | 30 | 0
2020-01-13 19:44:40.187+03 | 1 | 0 | 40
2020-01-13 19:45:06.935+03 | 2 | 15 | 0
2020-01-13 19:46:22.38+03 | 1 | 40 | 0
2020-01-13 19:50:43.176+03 | 2 | 0 | 15
2020-01-13 20:08:58.47+03 | 1 | 55 | 0
这张表很大,所以我想减少行数。
我正在尝试确定完成任务的最佳方法:如何修改原点(通过删除和更新行)表以获取以下行:
date | user_id | usd_value | eur_value
----------------------------+---------+-----------+-----------
2020-01-13 19:00:10.877+03 | 1 | 80 | 40
2020-01-13 19:45:06.935+03 | 2 | 15 | 15
2020-01-13 20:08:58.47+03 | 1 | 55 | 0
该表是查询的结果:
SELECT min(date) as date, user_id, sum(usd_value) as usd_value, sum(eur_value) as eur_value
from ledger
GROUP BY date_trunc('hour', date), user_id
ORDER BY date;