我可以在使用数据库后激活 PITR 吗？

Question

Asked: 2019-01-19 12:16:34 +0800 CST2019-01-19 12:16:34 +0800 CST 2019-01-19 12:16:34 +0800 CST

在一天的一个小时内获取价值增量

772

我查询 YouTube Data Api 以获得频道上最受欢迎的视频列表，然后获取它们的统计数据，每小时 4 次（每 15 分钟一次，由 cron）。数据存储在 Postgres 中，但将其转储并加载到另一个 SQL 数据库中不会有问题。现在我有以下数据表：

 video_id| views_count | likes_count | timestamp 
---------+-------------+-------------+---------------------
     foo | 100         | 1           | 2018-12-01 12:01:03
     foo | 101         | 1           | 2018-12-01 12:16:06
     foo | 105         | 1           | 2018-12-01 12:31:01
     bar | 199         | 0           | 2018-12-01 12:01:02
     bar | 200         | 0           | 2018-12-01 12:16:08
     bar | 301         | 5           | 2018-12-01 12:31:02
     ... | ...

UPD：这是架构（粘贴到sqlfiddle）：

CREATE TABLE video_statistics
(
  video_id TEXT not null,
  views_count INTEGER not null,
  likes_count INTEGER not null,
  timestamp TIMESTAMPTZ not null
);

我应该如何查询该数据以便按小时view_counts和likes_count按视频分组的列获得增量？澄清我想要得到的东西：

hour_of_day|video_id|views_increment|likes_increment
-----------+--------+---------------+---------------
     ...   | ...
     11    | foo    | 4             | 0
     12    | foo    | 5             | 1
     ...   | ...
     11    | bar    | 73            | 0
     12    | bar    | 102           | 5
     ...   | ...

换句话说，这是一个基于历史数据的“发布视频的最佳时间”，并考虑了数周和数月的数据。我应该将数据转储到一些时间序列数据库或其他更适合这种情况的数据库中，然后在那里查询吗？还是我应该求助于用代码计算这个？

2 个回答

Voted

sticky bit · Answer 1 · 2019-01-19T14:42:17+08:00

一种可能性是首先row_number()获取记录以获得每个视频、日期和小时的第一个和最后一个值。然后加入两组first和last值，得到各自的差值。对视频和小时的结果进行分组，并获得每天每个视频的总和或平均值。

SELECT first.video_id,
       first.timestamp_hour,
       sum(last.views_count - first.views_count) views_count_diff_sum,
       sum(last.likes_count - first.likes_count) likes_count_diff_sum,
       avg(last.views_count - first.views_count) views_count_diff_avg,
       avg(last.likes_count - first.likes_count) likes_count_diff_avg
       FROM (SELECT video_id,
             timestamp_day,
             timestamp_hour,
             views_count,
             likes_count
             FROM (SELECT video_id,
                          timestamp::date timestamp_day,
                          date_part('hour', timestamp) timestamp_hour,
                          views_count,
                          likes_count,
                          row_number() OVER (PARTITION BY video_id,
                                                          timestamp::date,
                                                          date_part('hour', timestamp)
                                             ORDER BY timestamp ASC) rn
                          FROM elbat) first
             WHERE rn = 1) first
            INNER JOIN (SELECT video_id,
                               timestamp_day,
                               timestamp_hour,
                               views_count,
                               likes_count
                               FROM (SELECT video_id,
                                            timestamp::date timestamp_day,
                                            date_part('hour', timestamp) timestamp_hour,
                                            views_count,
                                            likes_count,
                                            row_number() OVER (PARTITION BY video_id,
                                                                            timestamp::date,
                                                                            date_part('hour', timestamp)
                                                               ORDER BY timestamp DESC) rn
                                            FROM elbat) last
                               WHERE rn = 1) last
                       ON last.video_id = first.video_id
                          AND last.timestamp_day = first.timestamp_day
                          AND last.timestamp_hour = first.timestamp_hour
       GROUP BY first.video_id,
                first.timestamp_hour;

Lennart - Slava Ukraini · Answer 2 · 2019-01-19T14:03:14+08:00

架构：

create table T 
( video_id char(3) not null
, views_count int not null
, likes_count int not null
, ts timestamp not null
);

猜猜是这样的：

select hr, video_id
     , lag(vc) over (partition by video_id
                     order by hr) - vc as vc_incr
     , lag(lc) over (partition by video_id
                     order by hr) - lc as lc_incr                
from (                          
    select extract(hour from ts) as hr
         , video_id
         , sum(views_count) as vc
         , sum(likes_count) as lc
    from t
    group by extract(hour from ts)
           , video_id
 ) as tt;

请注意，您必须决定如何处理没有滞后行的行，即每个分区中的第一行。

在一天的一个小时内获取价值增量

连接到 PostgreSQL 服务器：致命：主机没有 pg_hba.conf 条目

如何让sqlplus的输出出现在一行中？

选择具有最大日期或最晚日期的日期

如何列出 PostgreSQL 中的所有模式？

列出指定表的所有列

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

在一天的一个小时内获取价值增量

2 个回答

相关问题