我可以在使用数据库后激活 PITR 吗？

Question

Asked: 2024-01-20 13:15:25 +0800 CST2024-01-20 13:15:25 +0800 CST 2024-01-20 13:15:25 +0800 CST

PostgreSQL 存储的 C 函数比独立 C 代码慢 60 倍

772

我正在尝试在 C 中构建存储函数，它将采用一个 bigint 数组作为参数，并返回另一个数组，该数组是参数数组的副本，但所有元素都递增 1。我的问题是，当我将这段代码作为独立程序运行时，它运行速度非常快：10M 元素需要 0.1 秒，但作为 Postgresql C 存储过程运行时需要 6 秒。

我想知道是否有人可以发现我是否做了任何明显错误的事情？...

我的C存储过程：

#include "postgres.h"
#include "fmgr.h"
#include "utils/builtins.h"
#include "utils/array.h"
#include "catalog/pg_type.h"

PG_MODULE_MAGIC;

#define ARRPTR(x)  ( (int64 *) ARR_DATA_PTR(x) )
#define ARRNELEMS(x)  ArrayGetNItems(ARR_NDIM(x), ARR_DIMS(x))

PG_FUNCTION_INFO_V1(test_inc);
Datum
test_inc(PG_FUNCTION_ARGS)
{
    ArrayType  *a = PG_GETARG_ARRAYTYPE_P(0);
    int n = ARRNELEMS(a);
    int nbytes = ARR_OVERHEAD_NONULLS(1) + sizeof(int64) * n;

    ArrayType  *r = (ArrayType *) palloc0(nbytes);
    SET_VARSIZE(r, nbytes);
    ARR_NDIM(r) = 1;
    r->dataoffset = 0; // marker for no null bitmap
    ARR_ELEMTYPE(r) = INT8OID;
    ARR_DIMS(r)[0] = n;
    ARR_LBOUND(r)[0] = 1;

    int64 *ad = ARRPTR(a);
    int64 *rd = ARRPTR(r);

    ereport(WARNING,
        errcode(ERRCODE_WARNING),
        errmsg("Before loop"));
    for (int i = 0; i < n; i++) {
        rd[i] = ad[i] + 1;
    }
    ereport(WARNING,
        errcode(ERRCODE_WARNING),
        errmsg("After loop"));

    PG_RETURN_POINTER(r);
}

我这样编译并安装它：

gcc -fPIC -O2 -I/usr/include/postgresql/15/server -I/usr/include/postgresql/internal -c test.c
gcc -shared -o test.so test.o
/usr/bin/install -c -m 755  test.so '/usr/lib/postgresql/15/lib/'

我用来测试它的SQL代码：

CREATE FUNCTION test_inc(_int8) RETURNS _int8 AS '/usr/lib/postgresql/15/lib/test.so'
LANGUAGE C IMMUTABLE PARALLEL SAFE;

create table t as select generate_series(0, 10000000) n;
create table t2 as select array_agg(n order by n) n from t;
create table t3 as select test_inc(n) n from t2;

最后一个查询在 6 秒内运行。删除一行rd[i] = ad[i] + 1;代码可以在 0.6 秒内运行。另外，查看警告消息，我发现执行卡在循环之后的某处，而不是循环内部。

我的独立 C 代码如下所示，运行时间为 0.1 秒：

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char ** argv) {
    long n = 10000000;
    long *a = (long*)malloc(n * sizeof(long));
    long *r = (long*)malloc(n * sizeof(long));
    for (int i = 0; i < n; i++) {
        a[i] = i;
    }
    for (int i = 0; i < n; i++) {
        r[i] = a[i] + 1;
    }
    // To make sure compiler does not remove previous loop because result is unused.
    long res = 0;
    for (int i = 0; i < n; i++) {
        res += r[i];
    }

    printf("%ld", res);
}

jjanes · Answer 1 · 2024-01-21T00:51:02+08:00

这主要与存储有关。如果您只是丢弃输出 ( explain analyze select test_inc(n) n from t2;) 而不是用它创建表，那么速度会快得多。

在您的虚拟代码中，所有值都为零，这使得输出字符串高度可压缩，并且压缩它比存储它更快。（真实代码为 30MB，虚拟代码为 960kB）。

进一步实验，主要慢的不是压缩得不太好，而是需要很长时间才不太好。切换到 lz4 会使速度更快，就像关闭压缩一样。

PostgreSQL 存储的 C 函数比独立 C 代码慢 60 倍

连接到 PostgreSQL 服务器：致命：主机没有 pg_hba.conf 条目

如何让sqlplus的输出出现在一行中？

选择具有最大日期或最晚日期的日期

如何列出 PostgreSQL 中的所有模式？

列出指定表的所有列

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

PostgreSQL 存储的 C 函数比独立 C 代码慢 60 倍

1 个回答

相关问题