我喜欢使用 CTE 来创建漂亮且清晰的查询。但是,我很确定我创建的查询效率确实很低。
有没有更好的方法来做到这一点并使事情变得清晰?
with first_date as (
-- selecting just 1 date
-- namely: 1 full year ago
select (extract(year from current_date - interval '1 year')||'-01-01' )::date as date
)
, last_date as (
select date from star.dim_date where current_cal_day='Current'
)
, total_active_customers_ps_day as(
select
dd.date
, dd.is_first_day_in_month
, dd.is_last_day_in_month
, count(dc.id) as total_customers
from first_date, last_date,
star.dim_date dd
-- join with dim_client, using first_subscription_start_date & last_subscription_end_date
-- to get the ids of just the active clients
join star.dim_client dc on dd.date
between dc.first_subscription_start_date and coalesce(dc.last_subscription_end_date::date, '3000-01-01')
and dc.created <= dd.date
and dc.first_subscription_start_date >= dc.created::date
where
dd.date >= first_date.date
and dd.date <= last_date.date
group by
dd.date
, dd.is_first_day_in_month
, dd.is_last_day_in_month
)
select * from total_active_customers_ps_day ;
我认为我引起了一些笛卡尔连接,因为这个查询效率更高
with total_active_customers_ps_day as(
select
dd.date
, dd.is_first_day_in_month
, dd.is_last_day_in_month
, count(dc.id) as total_customers
from
star.dim_date dd
-- join with dim_client, using first_subscription_start_date & last_subscription_end_date
-- to get the ids of just the active clients
join star.dim_client dc on dd.date
between dc.first_subscription_start_date and coalesce(dc.last_subscription_end_date::date, '3000-01-01')
and dc.created <= dd.date
and dc.first_subscription_start_date >= dc.created::date
where
dd.date >= (extract(year from current_date - interval '1 year')||'-01-01' )::date
and dd.date <= (select date from star.dim_date where current_cal_day='Current')
group by
dd.date
, dd.is_first_day_in_month
, dd.is_last_day_in_month
)
select * from total_active_customers_ps_day ;
有什么更好的方法来做到这一点?
WHERE
您可以在查询开始时在公用表表达式 (CTE) 中计算一次结束日期,而不是在子句中使用子查询来获取结束日期,这与您对开始日期所做的操作类似。这降低了子句的复杂性WHERE
,并可能使查询更加高效。star.dim_date
表与 CTEstart_date
和之间确实存在笛卡尔连接end_date
。这可能会导致性能问题,尤其是在star.dim_date
表包含大量行的情况下。为了避免笛卡尔联接并使查询更高效,您可以使用单个 CTE 来计算开始日期和结束日期,然后将此 CTE 与表联接
star.dim_date
。另外,我还使用该
DATE_TRUNC
函数从当前日期计算一年前的开始日期,这可以使查询更加直观。确保
JOIN
条件和WHERE
子句中使用的列已正确建立索引。在您的情况下,对dd.date
、dc.first_subscription_start_date
、 等列建立索引dc.last_subscription_end_date
可以显着提高查询性能。