给定两个数字n
和m
,我想生成一系列表格
1, 2, ..., (n-1), n, n, (n-1), ... 2, 1
并重复m
几次。
例如,对于n = 3
and m = 4
,我想要以下 24 个数字的序列:
1, 2, 3, 3, 2, 1, 1, 2, 3, 3, 2, 1, 1, 2, 3, 3, 2, 1, 1, 2, 3, 3, 2, 1
---------------- ---------------- ---------------- ----------------
我知道如何通过以下两种方法之一在 PostgreSQL 中实现此结果:
使用以下查询,它使用该generate_series
函数,以及一些技巧来保证顺序是正确的:
WITH parameters (n, m) AS
(
VALUES (3, 5)
)
SELECT
xi
FROM
(
SELECT
i, i AS xi
FROM
parameters, generate_series(1, parameters.n) AS x(i)
UNION ALL
SELECT
i + parameters.n, parameters.n + 1 - i AS xi
FROM
parameters, generate_series(1, parameters.n) AS x(i)
) AS s0
CROSS JOIN
generate_series (1, (SELECT m FROM parameters)) AS x(j)
ORDER BY
j, i ;
...或使用带有伴随和嵌套循环的函数用于相同目的:
CREATE FUNCTION generate_up_down_series(
_elements /* n */ integer,
_repetitions /* m */ integer)
RETURNS SETOF integer AS
$BODY$
declare
j INTEGER ;
i INTEGER ;
begin
for j in 1 .. _repetitions loop
for i in 1 .. _elements loop
return next i ;
end loop ;
for i in reverse _elements .. 1 loop
return next i ;
end loop ;
end loop ;
end ;
$BODY$
LANGUAGE plpgsql IMMUTABLE STRICT ;
我怎么可能在标准 SQL 或 Transact-SQL / SQL Server 中做同样的事情?
Postgres
您可以使其与单一
generate_series()
的基本数学一起工作(请参阅数学函数)。包装成一个简单的 SQL 函数:
称呼:
生成所需的结果。n和m可以是n*2*m不溢出的任何整数。
int4
如何?
在子查询中:
使用简单的升序生成所需的总行数 ( n*2*m )。我命名它
n2m
。0到N-1(不是1到N)以简化以下模运算。取% n*2(
%
是取模运算符)得到一系列n升序数,m次。我命名它n2
。在外部查询中:
将 1 加到下半部分(n2 < n)。
对于具有n*2 - n2的下半部分的上半部分 ( n2 >= n ) 镜像。
我添加
ORDER BY
以保证请求的订单。使用当前版本的 Postgres,它也可以在没有ORDER BY
简单查询的情况下工作 - 但不一定适用于更复杂的查询!这是一个实现细节(它不会改变),但不是 SQL 标准规定的。不幸的
generate_series()
是,正如已评论的那样,Postgres 是特定的而不是标准的 SQL。但是我们可以重用相同的逻辑:标准 SQL
您可以使用递归 CTE 而不是 生成序列号
generate_series()
,或者更有效地重复使用,创建一个包含序列整数的表一次。任何人都可以阅读,没有人可以写!然后,上面
SELECT
变得更加简单:TL;DR
This is a long one, so I'm putting the best (i.e. fastest of my) methods here first. It makes use of the INTARRAY extension - for parameters of 340 and 570, it takes 21ms*. The second best (22 ms - same parameters) only uses standard PostgreSQL constructs. If anyone else comes up with (a) faster method(s), I will put them here and "retire" mine!
* 8 GB RAM, Intel i5 11th gen CPU, Quad core - 8 threads, NVMe 256 GB drive
</TL;DR>
Introduction:
This question intrigued me (+1) and I pondered
a) how to answer it and
b) how could the answer be generalised?
All of the code below can be found in the various fiddles - per method/contributor.
Interestingly, nobody who's answered so far (10 answers - and some very good quality SQL/programming to boot!) has made use of
ARRAY
s (tutorial), which I believe are very helpful in this case.In a general sense, my approach has been to generate the first series as an ARRAY (
{1, 2,.. n, n,..2, 1}
) and then generatem
of these for the complete task.I took
threefive approaches:the first is PostgreSQL specific, making use of
GENERATE_SERIES()
(tutorial) andARRAY
s. There's also a function call that can be replaced with aRECURSIVE CTE
("RCTE"
- see tutorial).the second (also PostgreSQL specific) combines an
RCTE
with a call toGENERATE_SERIES()
andARRAY
s. TheGENERATE_SERIES()
can be replaced with anRCTE
- see solution 3.the third solution is
"Pure SQL"
and should work on any RDBMS that supportsRCTE
s (SQL Server for example) - can also use anumbers
table (i.e. a sequence) Following discussions on the dba chatroom, I have removed the requirement forRCTE
s to be used to construct sequences.then there are two "speedy" solutions which rely on the fact that
UNNEST()
preserves order.Preparation step:
I used a table called
param
to store the values (3
&5
) as per the question - these can obviously be changed. Also, for the benchmarking steps (see below), I used this param table for all queries tested so that there would be an even playing field. As mentioned above, anumbers
sequence table is also permitted.The first method is as follows:
Method 1 - GENERATE_SERIES (fiddle):
Step 1:
Result:
Step 2:
Results (same):
Step 3:
Result:
Finally:
Result:
Method 2 - Recursive CTE (fiddle):
Here, I managed to "kill two birds with one stone" by constructing both the desired sequence and its numbering scheme in the same RCTE as follows:
Result:
We only want the last record, so we select this by using the
CARDINALITY()
function - where that is equal to (n * 2
) is the last record (step not shown individually).Final step - see fiddle for more detail
Result (same as others):
A simpler, (and IMHO) more elegant solution exists - using the
GENERATE_SUBSCRIPTS()
function (explanation) as follows:Result (same):
Method 3 - Pure SQL (fiddle):
Final step:
Since all of the code below has been seen above in one form or another, I'm just including the final step. No PostgreSQL specific functionality has been used and it also works under SQL Server 2019 (Linux fiddle) - also works back to 2016 - all versions.
Result (same):
4th solution (and
clubhouse leader!
) (fiddle):Same result as for all the others.
5th solution (Honourable mention) (fiddle):
Same result as for all the others.
6th Solution (another scorcher! - uses the INTARRAY extension fiddle):
Same result!
Benchmarking:
I benchmarked all of the PostgreSQL solutions.
I have done my utmost to be fair in these benchmarks - I used a param table
(3, 5)
(also(34, 57)
and(340, 570)
at(home)
) in my SQL. For those queries requiring anumber
table (i.e. a sequence), after discussion, I have included it for those queries requiring it. I'm not entirely sure about this, since consultants are frequently forbidden from creating separate tables, no matter how trivial, but this appears to have been the consensus!Please let me know if you are unhappy with any of the tests and I'll gladly rerun them!
I used
db<>fiddle
for the tests and the usual caveats apply - I don't know what else is running on that server at any given moment in time - I took an average of several runs for each solution (the vast bulk of the results were within ~ 10% of each other - discarded obvious outliers (longer, not shorter times).It was pointed out to me (knew anyway) that 3 & 5 aren't very large numbers - I did try with low 100's for each parameter, but the runs kept failing on db<>fiddle.uk, but I would just say that the all of the runs were remarkably consistent with each other, only varying by ~ +- 10%.
The second reading runs with a are for values of 34 & 57 - feel free to try yourselves.
With the
(home)
tests, I used params of(340, 570)
on an 8GB machine (i5 - 10th gen, NVMe 256GB) - nothing else running - variance v. low ~ 1/2%!Vérace's (Another scorcher using INTARRAY!)
6th solution (fiddle) (0.110ms)/0.630 ms/21.5 ms (home) -new leader
!Vérace's (ex-Clubhouse leader) 4th solution (fiddle) 0.120 ms/0.625 ms/22.5 ms (home)
Vérace's (Honourable mention) 5th solution (fiddle) (0.95ms/0.615 (goes down!)/26 ms (home)
Vérace GENERATE_SERIES SQL method (fiddle): 0.195 ms/3.1 ms/140ms (home)
Vérace RECURSIVE CTE SQL method (fiddle): 0.200 ms/2.9 ms/145m (home)
Vérace GENERATE_SUBSCRIPTS() SQL method (fiddle): 0.110 ms/2.75 ms/130ms (home)
Vérace "Pure SQL" method (fiddle): 0.134 ms/2.85ms/190ms (home)
OP's SQL method (fiddle): 12.50 ms/18.5ms/190ms (home)
OP's PL/pgSQL function method (fiddle): 0.60 ms/0.075ms/86ms (home)
ypercube's SQL method (fiddle): 0.175 ms, /4.3 ms /240 ms (home)
ypercube's alternative method (fiddle): 0.090 ms/0.95 ms/36ms (home)
Erwin Brandtstetter's SQL method (fiddle): 2.15 ms /3.65 ms/160ms (home) (160 drops to ~ 100 without the ORDER BY - home)
Erwin Brandtstetter's function method (fiddle): 0.169 ms/2.3 ms/180 ms (home)
Abelisto's SQL method (fiddle) 0.145/fails if params changed?
Evan Carroll's SQL method (fiddle) 0.125 ms/1.1ms/45ms (home)
McNet's PL/pgSQL method (fiddle) 0.075 ms/ 0.075 ms/125ms (home)
Again, I reiterate (at the risk of repeating myself (multiple times! :-) )), if you are unhappy with your benchmark, please let me know and I will include any amendments here - I would just stress that I am genuinely interested in fairness and actually learning from this process - unicorn points are all very well, but my priority is on increasing my (our) knowledge-base!
The source code of PostgreSQL is a bit above my pay grade, but I believe that operations with
GENERATE_SERIES
andARRAY
s preserve order - theWITH ORDINALITY
implies this (again, I think) - the correct answers come up even without paying attention to ordering (although this not a guarantee)! @ErwinBrandstetter says that:My understanding is that
ARRAY
s are fast in PostgreSQL because much of the backend code is implemented through them - but as I said, I'm not really aC
expert.Current standings (as of 2021-10-27 13:50 UTC) are:
I found these posts on ARRAYs from Erwin Brandstetter (1) & a_horse_with_no_name (2) very helpful! Other's that I found helpful were as follows (1, 2).
在 Postgres 中,使用这个
generate_series()
函数很容易:在标准 SQL 中——假设参数 n、m 的大小有合理的限制,即小于一百万——你可以使用一个
Numbers
表:用您的 DBMS 的首选方法填充它:
然后使用它,而不是
generate_series()
:如果您需要纯 SQL。从理论上讲,它应该适用于大多数 DBMS(在 PostgreSQL 和 SQLite 上测试):
解释
生成系列 1..n
假如说
n=3
它非常简单,几乎可以在任何有关递归 CTE 的文档中找到。但是我们需要每个值的两个实例,所以
生成系列 1,1,..,n,n
这里我们只是将初始值加倍,它有两行,但我们需要的第二行是相反的顺序,所以我们将稍微介绍一下顺序。
在我们介绍命令之前,请注意这也是一件事。我们可以在起始条件中有两行,每行三列,我们
n<3
仍然是单列条件。而且,我们仍然只是增加n
.同样,我们可以将它们混合一下,在这里观察我们的起始条件变化:这里我们有一个
(6,2)
,(1,1)
生成系列 1..n,n..1
这里的技巧是生成系列 (1..n) 两次,然后简单地更改第二组的顺序。
这
i
是顺序,z
是序列的编号(如果需要,也可以是序列的一半)。因此,对于序列 1,我们将顺序从 1 增加到 3,对于序列 2,我们将顺序从 6 减少到 4。最后将系列乘以
m
(请参阅答案中的第一个查询)
在 PostgreSQL 中,这很容易,
If you want a portable solution you need to realize that this is basically a mathematical problem.
Given @n as the highest number of the sequence and @x as the position of the number in that sequence (starting with zero), the following function would work in SQL Server:
You can check it with this CTE:
(Quick explanation: the function uses MODULO() to create a sequence of repeating numbers and ABS() to turn it into a zig-zag wave. The other operations transform that wave to match the desired result.)
使用迭代器的基本函数。
T-SQL
Postgres
这适用于 MS-SQL,我认为可以针对任何 SQL 风格进行修改。
一种使用递归 cte 在 SQL Server 中执行此操作的方法。
在系列中生成所需数量的成员(对于n =3 和 m=4,它将是 24,即 2 nm )
在
case
表达式中使用逻辑之后,您可以生成所需的系列。Sample Demo
正如@AndriyM ..所建议的那样,
case
表达式可以简化为Demo
Using only basic Math
+ - * /
and Modulo:This doesn't require a specific RDBMS.
With
numbers
being a number table:This generate a number table (1-1000) without using a recursive CTE. See Sample. 2nm must be smaller than the number of row in numbers.
Output with n=3 and m=4:
This version requires a smaller number table (v >= n and v >= m):
See Sample.