我知道如何使用pg_total_relation_size()
, 和查找关系和数据库大小pg_database_size()
。但我想找到整个簇的大小。data
除了使用文件管理器或使用这样的查询计算目录的磁盘空间之外,还有其他方法吗?
SELECT pg_size_pretty(sum(pg_database_size(datname))) AS "total databases size"
FROM pg_database;
我知道如何使用pg_total_relation_size()
, 和查找关系和数据库大小pg_database_size()
。但我想找到整个簇的大小。data
除了使用文件管理器或使用这样的查询计算目录的磁盘空间之外,还有其他方法吗?
SELECT pg_size_pretty(sum(pg_database_size(datname))) AS "total databases size"
FROM pg_database;
我有一个 PostgreSQL 服务器版本。15.4 我有一个如下查询:
SELECT i."Id"
FROM "InventoryItems" i
INNER JOIN "Products" p ON i."ProductBaseId" = p."Id"
INNER JOIN (SELECT "CustomerMarketPlaceId", PLAINTO_TSQUERY('english', "Keyword") "Keyword"
FROM "RestrictedKeywords"
WHERE "CustomerMarketPlaceId" = 19100
LIMIT 158
) z
ON z."CustomerMarketPlaceId" = i."CustomerMarketPlaceId"
WHERE i."CustomerMarketPlaceId" = 19100
AND i."Status" IN (1, 2, 3, 4, 6, 7)
AND TO_TSVECTOR('english', COALESCE(p."Title", '')) @@ z."Keyword";
请注意子查询中奇怪的数字158 。这会产生下面的执行计划,这很酷,全部使用索引、仅索引扫描、堆扫描等。
Hash Join (cost=1704.06..102240.40 rows=5758 width=8)
Hash Cond: (p.""Id"" = i.""ProductBaseId"")"
-> Nested Loop (cost=59.43..97792.89 rows=244027 width=8)
-> Subquery Scan on z (cost=0.29..45.30 rows=79 width=36)
Filter: (z.""CustomerMarketPlaceId"" = 19100)"
-> Limit (cost=0.29..43.32 rows=158 width=36)
-> Index Only Scan using ""IX_RestrictedKeywords_CustomerMarketPlaceId"" on ""RestrictedKeywords"" (cost=0.29..3199.81 rows=11747 width=36)"
Index Cond: (""CustomerMarketPlaceId"" = 19100)"
-> Bitmap Heap Scan on ""Products"" p (cost=59.14..1206.42 rows=3089 width=132)"
Recheck Cond: (to_tsvector('english'::regconfig, (COALESCE(""Title"", ''::character varying))::text) @@ z.""Keyword"")"
-> Bitmap Index Scan on ""IX_Products_Title"" (cost=0.00..58.37 rows=3089 width=0)"
Index Cond: (to_tsvector('english'::regconfig, (COALESCE(""Title"", ''::character varying))::text) @@ z.""Keyword"")"
-> Hash (cost=1462.43..1462.43 rows=14576 width=16)
-> Index Scan using ""IX_InventoryItems_CustomerMarketPlaceId_Status"" on ""InventoryItems"" i (cost=0.29..1462.43 rows=14576 width=16)"
Index Cond: ((""CustomerMarketPlaceId"" = 19100) AND (""Status"" = ANY ('{1,2,3,4,6,7}'::integer[])))"
但是当我将该幻数更改为159 时,会出现以下查询计划:
Hash Join (cost=50296.29..104375.05 rows=6195 width=8)
Hash Cond: (i.""ProductBaseId"" = p.""Id"")"
Join Filter: (to_tsvector('english'::regconfig, (COALESCE(p.""Title"", ''::character varying))::text) @@ z.""Keyword"")"
-> Nested Loop (cost=0.58..16998.07 rows=1238960 width=44)
-> Index Scan using ""IX_InventoryItems_CustomerMarketPlaceId_Status"" on ""InventoryItems"" i (cost=0.29..1462.43 rows=14576 width=16)"
Index Cond: ((""CustomerMarketPlaceId"" = 19100) AND (""Status"" = ANY ('{1,2,3,4,6,7}'::integer[])))"
-> Materialize (cost=0.29..48.86 rows=85 width=36)
-> Subquery Scan on z (cost=0.29..48.43 rows=85 width=36)
Filter: (z.""CustomerMarketPlaceId"" = 19100)"
-> Limit (cost=0.29..46.32 rows=169 width=36)
-> Index Only Scan using ""IX_RestrictedKeywords_CustomerMarketPlaceId"" on ""RestrictedKeywords"" (cost=0.29..3199.81 rows=11747 width=36)"
Index Cond: (""CustomerMarketPlaceId"" = 19100)"
-> Hash (cost=30535.76..30535.76 rows=616876 width=132)
-> Seq Scan on ""Products"" p (cost=0.00..30535.76 rows=616876 width=132)"
突然,它开始认为对Products
表进行完整扫描在理论上会执行得更好,但在实践中,查询速度减慢了 100 倍(或者可能更多)。我尝试更改
default_statistics_target
为更大的数字,并重新分析查询中的相关表并没有改变结果。我在生产和测试环境中都尝试过这一点,唯一的区别是限制,因为生产环境有更多的 RAM 可供使用。是什么导致 PostgreSQL 改变主意?(我猜它认为查询不适合内存???)为什么它选择一个极其糟糕的执行计划?即使不使用gin
索引,查询仍然可以通过"InventoryItems"("CustomerMarketPlaceId","Status")
索引进行过滤。
我不知道这是否有帮助,但查询中使用的索引如下所示:
CREATE INDEX "IX_Products_Title"
ON "Products" USING gin (TO_TSVECTOR('english'::REGCONFIG, COALESCE("Title", ''::CHARACTER VARYING)::TEXT));
如果我像下面这样重构相同的查询(即使我删除了子查询中的限制),它将选择更好的执行计划,并且确实会产生更快的查询
SELECT i."Id"
FROM "InventoryItems" i
INNER JOIN "Products" p ON i."ProductBaseId" = p."Id"
INNER JOIN (SELECT "CustomerMarketPlaceId", PLAINTO_TSQUERY('english', "Keyword") "Keyword"
FROM "RestrictedKeywords"
-- WHERE "CustomerMarketPlaceId" = 19100
--- LIMIT 500
) z
ON TO_TSVECTOR('english', COALESCE(p."Title", '')) @@ z."Keyword" z."CustomerMarketPlaceId" = i."CustomerMarketPlaceId"
WHERE i."CustomerMarketPlaceId" = 19100
AND i."Status" IN (1, 2, 3, 4, 6, 7)
AND TO_TSVECTOR('english', COALESCE(p."Title", '')) @@ z."Keyword"; -- <-- removing this condition will cause full scans
产生这个执行计划:
Hash Join (cost=1646.19..238317.48 rows=4281 width=8)
Hash Cond: (p.""Id"" = i.""ProductBaseId"")"
-> Nested Loop (cost=1.56..234592.40 rows=181124 width=4)
-> Index Only Scan using ""IX_RestrictedKeywords_CustomerMarketPlaceId"" on ""RestrictedKeywords"" (cost=0.29..263.06 rows=11747 width=11)"
Index Cond: (""CustomerMarketPlaceId"" = 19100)"
-> Bitmap Heap Scan on ""Products"" p (cost=1.28..19.80 rows=15 width=132)"
Recheck Cond: ((to_tsvector('english'::regconfig, (COALESCE(""Title"", ''::character varying))::text) @@ plainto_tsquery('english'::regconfig, (""RestrictedKeywords"".""Keyword"")::text)) AND (to_tsvector('english'::regconfig, (COALESCE(""Title"", ''::character varying))::text) @@ plainto_tsquery('english'::regconfig, (""RestrictedKeywords"".""Keyword"")::text)))"
-> Bitmap Index Scan on ""IX_Products_Title"" (cost=0.00..1.27 rows=15 width=0)"
Index Cond: ((to_tsvector('english'::regconfig, (COALESCE(""Title"", ''::character varying))::text) @@ plainto_tsquery('english'::regconfig, (""RestrictedKeywords"".""Keyword"")::text)) AND (to_tsvector('english'::regconfig, (COALESCE(""Title"", ''::character varying))::text) @@ plainto_tsquery('english'::regconfig, (""RestrictedKeywords"".""Keyword"")::text)))"
-> Hash (cost=1462.43..1462.43 rows=14576 width=12)
-> Index Scan using ""IX_InventoryItems_CustomerMarketPlaceId_Status"" on ""InventoryItems"" i (cost=0.29..1462.43 rows=14576 width=12)"
Index Cond: ((""CustomerMarketPlaceId"" = 19100) AND (""Status"" = ANY ('{1,2,3,4,6,7}'::integer[])))"
任何帮助我理解这些执行计划选择的解释将不胜感激。
编辑:我已explain (analyze,buffers)
按照评论中的要求添加了前两个限制查询的输出:
Hash Join (cost=1684.53..99640.82 rows=5780 width=8) (actual time=15.972..333.366 rows=1953 loops=1)
Hash Cond: (p.""Id"" = i.""ProductBaseId"")"
Buffers: shared hit=41110 dirtied=1513
-> Nested Loop (cost=34.49..95801.34 rows=243616 width=8) (actual time=0.273..316.567 rows=47020 loops=1)
Buffers: shared hit=39674 dirtied=1513
-> Subquery Scan on z (cost=0.29..45.30 rows=79 width=36) (actual time=0.050..0.566 rows=158 loops=1)
Filter: (z.""CustomerMarketPlaceId"" = 19100)"
Buffers: shared hit=4
-> Limit (cost=0.29..43.32 rows=158 width=36) (actual time=0.049..0.544 rows=158 loops=1)
Buffers: shared hit=4
-> Index Only Scan using ""IX_RestrictedKeywords_CustomerMarketPlaceId"" on ""RestrictedKeywords"" (cost=0.29..3199.81 rows=11747 width=36) (actual time=0.048..0.528 rows=158 loops=1)"
Index Cond: (""CustomerMarketPlaceId"" = 19100)"
Heap Fetches: 0
Buffers: shared hit=4
-> Bitmap Heap Scan on ""Products"" p (cost=34.20..1181.26 rows=3084 width=132) (actual time=0.127..1.960 rows=298 loops=158)"
Recheck Cond: (to_tsvector('english'::regconfig, (COALESCE(""Title"", ''::character varying))::text) @@ z.""Keyword"")"
Heap Blocks: exact=37818
Buffers: shared hit=39670 dirtied=1513
-> Bitmap Index Scan on ""IX_Products_Title"" (cost=0.00..33.43 rows=3084 width=0) (actual time=0.096..0.096 rows=298 loops=158)"
Index Cond: (to_tsvector('english'::regconfig, (COALESCE(""Title"", ''::character varying))::text) @@ z.""Keyword"")"
Buffers: shared hit=1803
-> Hash (cost=1467.12..1467.12 rows=14634 width=16) (actual time=11.340..11.340 rows=19978 loops=1)
Buckets: 32768 (originally 16384) Batches: 1 (originally 1) Memory Usage: 1193kB
Buffers: shared hit=1436
-> Index Scan using ""IX_InventoryItems_CustomerMarketPlaceId_Status"" on ""InventoryItems"" i (cost=0.29..1467.12 rows=14634 width=16) (actual time=0.012..8.048 rows=19978 loops=1)"
Index Cond: ((""CustomerMarketPlaceId"" = 19100) AND (""Status"" = ANY ('{1,2,3,4,6,7}'::integer[])))"
Buffers: shared hit=1436
Planning:
Buffers: shared hit=29
Planning Time: 0.753 ms
Execution Time: 333.540 ms
Hash Join (cost=50366.43..107003.29 rows=6585 width=8) (actual time=414.038..86828.819 rows=2373 loops=1)
Hash Cond: (i.""ProductBaseId"" = p.""Id"")"
Join Filter: (to_tsvector('english'::regconfig, (COALESCE(p.""Title"", ''::character varying))::text) @@ z.""Keyword"")"
Rows Removed by Join Filter: 3593667
Buffers: shared hit=25883 dirtied=267, temp read=29576 written=29576"
-> Nested Loop (cost=0.58..17982.15 rows=1317060 width=44) (actual time=0.045..591.422 rows=3596040 loops=1)
Buffers: shared hit=1440
-> Index Scan using ""IX_InventoryItems_CustomerMarketPlaceId_Status"" on ""InventoryItems"" i (cost=0.29..1467.12 rows=14634 width=16) (actual time=0.013..7.770 rows=19978 loops=1)"
Index Cond: ((""CustomerMarketPlaceId"" = 19100) AND (""Status"" = ANY ('{1,2,3,4,6,7}'::integer[])))"
Buffers: shared hit=1436
-> Materialize (cost=0.29..52.01 rows=90 width=36) (actual time=0.000..0.008 rows=180 loops=19978)
Buffers: shared hit=4
-> Subquery Scan on z (cost=0.29..51.56 rows=90 width=36) (actual time=0.029..0.531 rows=180 loops=1)
Filter: (z.""CustomerMarketPlaceId"" = 19100)"
Buffers: shared hit=4
-> Limit (cost=0.29..49.31 rows=180 width=36) (actual time=0.028..0.507 rows=180 loops=1)
Buffers: shared hit=4
-> Index Only Scan using ""IX_RestrictedKeywords_CustomerMarketPlaceId"" on ""RestrictedKeywords"" (cost=0.29..3199.81 rows=11747 width=36) (actual time=0.028..0.491 rows=180 loops=1)"
Index Cond: (""CustomerMarketPlaceId"" = 19100)"
Heap Fetches: 0
Buffers: shared hit=4
-> Hash (cost=30610.49..30610.49 rows=616749 width=132) (actual time=349.356..349.356 rows=616749 loops=1)
Buckets: 524288 Batches: 4 Memory Usage: 26496kB
Buffers: shared hit=24443 dirtied=267, temp written=7668"
-> Seq Scan on ""Products"" p (cost=0.00..30610.49 rows=616749 width=132) (actual time=0.004..178.437 rows=616749 loops=1)"
Buffers: shared hit=24443 dirtied=267
Planning:
Buffers: shared hit=29
Planning Time: 0.423 ms
Execution Time: 86829.206 ms
我正在使用 LOAD DATA LOCAL INFILE 将 .csv 文件加载到 Mariadb 10.6 中。
入站数据中有一个 DATE 列posted_on
,我需要将月份名称存储在另一个字段中monther
,最好作为加载过程的一部分。
如果 posts_on = '2023-10-13' 则 Monther = 'October'
我认为触发器是合适的,并尝试了以下方面的变化(添加了新内容,之前/之后):
CREATE TRIGGER `t_add_monther` after INSERT ON `qn_txs`
FOR EACH ROW
UPDATE qn_txs SET NEW.qn_txs.`monther` = MONTHNAME(`posted_on`);
但无法克服第 1 行可怕的错误 1442 (HY000):无法更新存储函数/触发器中的表“qn_txs”,因为它已被调用此存储函数/触发器的语句使用。
有什么想法吗?是否需要采取不同的方法?
我有一个数据库,其 ER 图如下所示:
我很难获得所有拥有超过五条消息的用户。我能够用 python 编写查询:
q = "SELECT user_id from users"
for row in cursor.execute(q):
inner_cur = connect.cursor()
n = inner_cur.execute("SELECT count(*) from messages WHERE user_id='{}' ".format(row[0])).fetchall()[0][0]
# print("user_id =", row[0], "num of messages =", n)
if n > 5:
c = connect.cursor()
print(c.execute("SELECT username from users WHERE user_id='{}' ".format(row[0])).fetchall()[0][0])
connect.commit()
但我需要在 DBeaver 中执行此操作。我想也许我需要使用join
,count
或group by
,但我想不出如何使用。有人可以解释一下如何做吗?
我需要优化以下查询的建议/想法:
SELECT COUNT(*)
FROM (SELECT 1
FROM f
WHERE parent_sha256 = $1
AND parent_sha256 <> sha256
LIMIT 1) AS row_count;
需要验证特定的parent_sha256是否有任何直接子代。表很大(50M行),当有700k~900k行匹配parent_sha256 = $1
但0行匹配时,查询非常慢(6-12分钟) parent_sha256 = $1 AND parent_sha256 <> sha256
。
在这种情况下,DB 必须超过 700k~900k。
值得一提的是,parent_sha256
和上都有索引sha256
。
给定带有部分索引的表定义:
BEGIN;
CREATE TABLE user_note (
user_id VARCHAR NOT NULL,
item_id VARCHAR,
note VARCHAR NOT NULL,
archived_at TIMESTAMP
);
CREATE UNIQUE INDEX active_note
ON user_note(user_id, item_id)
WHERE archived_at IS NULL;
END;
我想确保只有一个值(user_id, item_id)
尚未存档。此约束不应应用于具有archived_at
非空值的记录(即,我们应该允许给定配对有许多存档记录(user_id, item_id)
)。
user_id
当和都被指定时,上述约束将按预期工作item_id
:
BEGIN;
INSERT INTO user_note (user_id, item_id, note) VALUES ('user_1', 'item_1', 'A general note');
INSERT INTO user_note (user_id, item_id, note) VALUES ('user_1', 'item_1', 'A different note');
END;
给出以下错误:
BEGIN
INSERT 0 1
ERROR: duplicate key value violates unique constraint "active_note"
DETAIL: Key (user_id, item_id)=(user_1, item_1) already exists.
make: *** [populate] Error 1
但允许使用line_id
以下多个记录NULL
:
BEGIN;
INSERT INTO user_note (user_id, note) VALUES ('user_1', 'A general note');
INSERT INTO user_note (user_id, note) VALUES ('user_1', 'A different note');
END;
输出:
BEGIN
INSERT 0 1
INSERT 0 1
COMMIT
我还尝试过使用唯一索引,其空值不明显,如下所示:
BEGIN;
CREATE TABLE user_note (
user_id VARCHAR NOT NULL,
item_id VARCHAR,
note VARCHAR NOT NULL,
archived_at TIMESTAMP,
UNIQUE NULLS NOT DISTINCT (user_id, item_id)
);
END;
但这当然没有考虑到archived_at
价值:
BEGIN;
INSERT INTO user_note (user_id, note, archived_at) VALUES ('user_1', 'A general note', CURRENT_TIMESTAMP);
INSERT INTO user_note (user_id, note, archived_at) VALUES ('user_1', 'A different note', CURRENT_TIMESTAMP);
END;
我收到这个不需要的错误:
BEGIN
INSERT 0 1
ERROR: duplicate key value violates unique constraint "user_note_user_id_item_id_key"
DETAIL: Key (user_id, item_id)=(user_1, null) already exists.
make: *** [populate] Error 1
有没有办法在is(user_id, item_id)
时禁止多个条目,但在not时允许多个条目?archived_at
NULL
archived_at
NULL
有没有办法在客户端挂载DBFS,还是只能挂载在数据库服务器端?
我已经dbfs_client
在Oracle Instant Client
zip 中搜索了二进制文件,但似乎这样的二进制文件不存在。DBFS只能挂载在数据库服务器上吗?为什么这么多参考文献说它像一个nfs
?nfs
可以安装在任何客户端机器上。
这是我们遇到的问题:我为多个 SQL Server 每晚作业打开了作业通知。有时,当我不在办公室时,我会收到这些电子邮件通知,并且我的 Outlook 设置为自动发送“不在办公室”电子邮件。 问题是“不在办公室”电子邮件将发送给我的同事。 同样,当我的存储过程向用户发送警告电子邮件时,如果用户打开了“外出”电子邮件,我的同事就会收到所有这些“外出”电子邮件。
我的同事很生气。
我的同事设置了我们的 SQL Server 数据库邮件。他设置了一个名为“内部 DBA”的帐户,然后将自己的电子邮件地址作为“内部 DBA”帐户的电子邮件地址。“内部 DBA”帐户是当前在 SQL Server 上设置的唯一数据库邮件帐户。我对 SQL Server 数据库邮件不是很熟悉,但我猜这就是我的同事收到所有“不在办公室”电子邮件的原因。
我认为解决方案是将我同事的电子邮件地址替换为不受监控的电子邮件地址,以用作数据库邮件帐户。这样一来,就没有人会收到“不在办公室”的电子邮件。
我们的旧 SQL Server 没有这个问题。我不知道他为什么这样设置,但我认为我的同事认为他需要将他的电子邮件地址作为数据库邮件帐户使用的电子邮件。使用 DBA 的个人电子邮件地址作为数据库邮件帐户的地址是否有充分的理由?有人知道什么是标准做法吗?或者我误解了问题的原因?
任何有启发性的想法将不胜感激!
我们想要使用如下例所示的 SQL 语句来压缩表。而且,为了防止压缩后发生任何问题,我们需要回滚到最后一个好的点。我们该怎么做呢?
我们的问题:
对于下面的压缩SQL语句,我们如何撤消压缩呢?
请参阅下面的“附加信息”。
我暂时认为 GUI Management Studio 会生成错误的 SQL 来撤消索引上的压缩。我是否看到与其他人相同的症状?
细节:
行级压缩的示例 SQL 语句:
-- To row-level compress the table
USE [database-under-test]
ALTER TABLE [schema].[sales_order] REBUILD PARTITION = ALL
WITH
(DATA_COMPRESSION = ROW
)
-- To row-level compress one index, and
-- the other indices of the table are similar
USE [database-under-test]
ALTER INDEX [SALES_ORDER_CREATED_AT] ON [schema].[sales_order] REBUILD PARTITION = ALL WITH (
PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,
ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON,
DATA_COMPRESSION = ROW
)
-- Compression statements are similar for the other indices of the table
-- ...
附加信息:
我暂时认为 GUI Management Studio 会生成错误的 SQL 来撤消索引上的压缩。
例如,通过“存储管理”向导,我们将压缩设置为“无”,它会得到:
ALTER INDEX [SALES_ORDER_UPDATED_AT] ON [schema].[sales_order] REBUILD PARTITION = ALL WITH (
PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
SORT_IN_TEMPDB = OFF, ONLINE = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON
)
DATA_COMPRESSION = NONE
注意到语句中没有,执行命令后索引仍然显示行级压缩。
我是否看到与其他人相同的症状?
我正在尝试调试下面的慢查询,但我很难理解它为什么慢。我可以看到计划和子计划都执行索引扫描,包括子计划的“仅索引扫描”,因此两者都应该很快。然而,这个特定查询需要 7 秒。
从这个 EXPLAIN 输出中你知道问题可能出在哪里吗?
select "id", "item_id", "item_name", "type", "updated_time" from "changes"
where (
((type = 1 OR type = 3) AND user_id = 'USER_ID')
or type = 2 AND item_id IN (SELECT item_id FROM user_items WHERE user_id = 'USER_ID')
) and "counter" > '35885954' order by "counter" asc limit 100;
Limit (cost=8409.70..8553.44 rows=100 width=101) (actual time=7514.730..7514.731 rows=0 loops=1)
-> Index Scan using changes_pkey on changes (cost=8409.70..2387708.44 rows=1655325 width=101) (actual time=7514.728..7514.729 rows=0 loops=1)
Index Cond: (counter > 35885954)
Filter: ((((type = 1) OR (type = 3)) AND ((user_id)::text = 'USER_ID'::text)) OR ((type = 2) AND (hashed SubPlan 1)))
Rows Removed by Filter: 11378536
SubPlan 1
-> Index Only Scan using user_items_user_id_item_id_unique on user_items (cost=0.56..8401.57 rows=3030 width=24) (actual time=0.085..3.011 rows=3589 loops=1)
Index Cond: (user_id = 'USER_ID'::text)
Heap Fetches: 2053
Planning Time: 0.245 ms
Execution Time: 7514.781 ms
(11 rows)