Preciso de índices separados para cada tipo de consulta ou um índice de várias colunas funcionará?

Question

datagod

Asked: 2012-03-09 09:32:03 +0800 CST2012-03-09 09:32:03 +0800 CST 2012-03-09 09:32:03 +0800 CST

Você conhece uma maneira fácil de gerar um registro para cada hora das últimas 12 horas?

772

Tenho um relatório que mostra a contagem de eventos nas últimas 12 horas, agrupados por hora. Parece fácil, mas estou lutando para incluir registros que cubram as lacunas.

Aqui está uma tabela de exemplo:

Event
(
  EventTime datetime,
  EventType int
)

Os dados ficam assim:

  '2012-03-08 08:00:04', 1
  '2012-03-08 09:10:00', 2
  '2012-03-08 09:11:04', 2
  '2012-03-08 09:10:09', 1
  '2012-03-08 10:00:17', 4
  '2012-03-08 11:00:04', 1

Preciso criar um conjunto de resultados que tenha um registro para cada hora das últimas 12 horas, independentemente de haver eventos durante essa hora ou não.

Supondo que a hora atual seja '2012-03-08 11:00:00', o relatório mostraria (aproximadamente):

Hour  EventCount
----  ----------
23    0
0     0
1     0
2     0
3     0
4     0
5     0
6     0
7     0
8     1
9     3
10    1

Eu criei uma solução que usa uma tabela que possui um registro para cada hora do dia. Consegui obter os resultados que procurava usando um UNION e alguma lógica de caso complicada na cláusula where, mas esperava que alguém tivesse uma solução mais elegante.

4 respostas

Voted

Lamak · Answer 1 · 2012-03-09T10:38:51+08:00

Best Answer

Lamak

2012-03-09T10:38:51+08:002012-03-09T10:38:51+08:00

Para o SQL Server 2005+, você pode gerar esses 12 registros com muita facilidade com um loop ou um CTE recursivo. Aqui está um exemplo de um CTE recursivo:

DECLARE @Date DATETIME
SELECT @Date = '20120308 11:00:00'

;WITH Dates AS
(
    SELECT DATEPART(HOUR,DATEADD(HOUR,-1,@Date)) [Hour], 
      DATEADD(HOUR,-1,@Date) [Date], 1 Num
    UNION ALL
    SELECT DATEPART(HOUR,DATEADD(HOUR,-1,[Date])), 
      DATEADD(HOUR,-1,[Date]), Num+1
    FROM Dates
    WHERE Num <= 11
)
SELECT [Hour], [Date]
FROM Dates

Então você só precisa juntá-lo com sua tabela de eventos.

20

Henry Lee · Answer 2 · 2012-03-09T10:38:24+08:00

Tabelas de contagem podem ser usadas para coisas como esta. Eles podem ser muito eficientes. Crie a tabela de contagem abaixo. Eu criei a tabela de contagem com apenas 24 linhas para o seu exemplo, mas você pode criá-la com quantas quiser para atender a outros propósitos.

SELECT TOP 24 
        IDENTITY(INT,1,1) AS N
   INTO dbo.Tally
   FROM Master.dbo.SysColumns sc1,
        Master.dbo.SysColumns sc2

--===== Add a Primary Key to maximize performance
  ALTER TABLE dbo.Tally
    ADD CONSTRAINT PK_Tally_N 
        PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100

Presumi que sua tabela se chama dbo.tblEvents, execute a consulta abaixo. Acredito que seja isso que você procura:

SELECT t.n, count(e.EventTime)
FROM dbo.Tally t
LEFT JOIN dbo.tblEvent e  on t.n = datepart(hh, e.EventTime)
GROUP BY t.n
ORDER BY t.n

Acredito que os créditos vão para os seguintes links, acredito que foi aqui que me deparei com isso:

http://www.sqlservercentral.com/articles/T-SQL/62867/

http://www.sqlservercentral.com/articles/T-SQL/74118/

Jeff Moden · Answer 3 · 2016-02-21T20:01:03+08:00

Em primeiro lugar, minhas desculpas pela demora em minha resposta desde meus últimos comentários.

Surgiu nos comentários o assunto de que usar um CTE Recursivo (rCTE daqui em diante) roda rápido o suficiente por causa do baixo número de linhas. Embora possa parecer assim, nada poderia estar mais longe da verdade.

CONSTRUIR TABELA DE CONTAGEM E FUNÇÃO DE CONTAGEM

Antes de começarmos os testes, precisamos construir uma Tabela de Tally física com o Índice Agrupado apropriado e uma Função de Tally estilo Itzik Ben-Gan. Também faremos tudo isso no TempDB para que não deixemos cair acidentalmente as guloseimas de ninguém.

Aqui está o código para construir a Tally Table e minha versão de produção atual do maravilhoso código de Itzik.

--===== Do this in a nice, safe place that everyone has
    USE tempdb
;
--===== Create/Recreate a Physical Tally Table
     IF OBJECT_ID('dbo.Tally','U') IS NOT NULL
        DROP TABLE dbo.Tally
;
     -- Note that the ISNULL makes a NOT NULL column
 SELECT TOP 1000001
        N = ISNULL(ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1,0)
   INTO dbo.Tally
   FROM      sys.all_columns ac1
  CROSS JOIN sys.all_columns ac2
;
  ALTER TABLE dbo.Tally
    ADD CONSTRAINT PK_Tally PRIMARY KEY CLUSTERED (N)
;
--===== Create/Recreate a Tally Function
     IF OBJECT_ID('dbo.fnTally','IF') IS NOT NULL
        DROP FUNCTION dbo.fnTally
;
GO
 CREATE FUNCTION [dbo].[fnTally]
/**********************************************************************************************************************
 Purpose:
 Return a column of BIGINTs from @ZeroOrOne up to and including @MaxN with a max value of 1 Trillion.

 As a performance note, it takes about 00:02:10 (hh:mm:ss) to generate 1 Billion numbers to a throw-away variable.

 Usage:
--===== Syntax example (Returns BIGINT)
 SELECT t.N
   FROM dbo.fnTally(@ZeroOrOne,@MaxN) t
;

 Notes:
 1. Based on Itzik Ben-Gan's cascading CTE (cCTE) method for creating a "readless" Tally Table source of BIGINTs.
    Refer to the following URLs for how it works and introduction for how it replaces certain loops. 
    http://www.sqlservercentral.com/articles/T-SQL/62867/
    http://sqlmag.com/sql-server/virtual-auxiliary-table-numbers
 2. To start a sequence at 0, @ZeroOrOne must be 0 or NULL. Any other value that's convertable to the BIT data-type
    will cause the sequence to start at 1.
 3. If @ZeroOrOne = 1 and @MaxN = 0, no rows will be returned.
 5. If @MaxN is negative or NULL, a "TOP" error will be returned.
 6. @MaxN must be a positive number from >= the value of @ZeroOrOne up to and including 1 Billion. If a larger
    number is used, the function will silently truncate after 1 Billion. If you actually need a sequence with
    that many values, you should consider using a different tool. ;-)
 7. There will be a substantial reduction in performance if "N" is sorted in descending order.  If a descending 
    sort is required, use code similar to the following. Performance will decrease by about 27% but it's still
    very fast especially compared with just doing a simple descending sort on "N", which is about 20 times slower.
    If @ZeroOrOne is a 0, in this case, remove the "+1" from the code.

    DECLARE @MaxN BIGINT; 
     SELECT @MaxN = 1000;
     SELECT DescendingN = @MaxN-N+1 
       FROM dbo.fnTally(1,@MaxN);

 8. There is no performance penalty for sorting "N" in ascending order because the output is explicity sorted by
    ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

 Revision History:
 Rev 00 - Unknown     - Jeff Moden 
        - Initial creation with error handling for @MaxN.
 Rev 01 - 09 Feb 2013 - Jeff Moden 
        - Modified to start at 0 or 1.
 Rev 02 - 16 May 2013 - Jeff Moden 
        - Removed error handling for @MaxN because of exceptional cases.
 Rev 03 - 22 Apr 2015 - Jeff Moden
        - Modify to handle 1 Trillion rows for experimental purposes.
**********************************************************************************************************************/
        (@ZeroOrOne BIT, @MaxN BIGINT)
RETURNS TABLE WITH SCHEMABINDING AS 
 RETURN WITH
  E1(N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
            SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
            SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
            SELECT 1)                                  --10E1 or 10 rows
, E4(N) AS (SELECT 1 FROM E1 a, E1 b, E1 c, E1 d)      --10E4 or 10 Thousand rows
,E12(N) AS (SELECT 1 FROM E4 a, E4 b, E4 c)            --10E12 or 1 Trillion rows                 
            SELECT N = 0 WHERE ISNULL(@ZeroOrOne,0)= 0 --Conditionally start at 0.
             UNION ALL 
            SELECT TOP(@MaxN) N = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E12 -- Values from 1 to @MaxN
;
GO

A propósito... observe que construiu uma tabela de contagem de um milhão e uma linha e adicionou um índice agrupado a ela em cerca de um segundo. Tente ISSO com um rCTE e veja quanto tempo leva! ;-)

CONSTRUIR ALGUNS DADOS DE TESTE

Também precisamos de alguns dados de teste. Sim, concordo que todas as funções que vamos testar, incluindo o rCTE, são executadas em milissegundos ou menos por apenas 12 linhas, mas essa é a armadilha em que muitas pessoas caem. Falaremos mais sobre essa armadilha mais tarde, mas, por enquanto, vamos simular chamar cada função 40.000 vezes, que é sobre quantas vezes certas funções na minha loja são chamadas em um dia de 8 horas. Imagine quantas vezes essas funções podem ser chamadas em um grande negócio de varejo online.

Então, aqui está o código para criar 40.000 linhas com datas aleatórias, cada uma com um número de linha apenas para fins de rastreamento. Não me dei ao trabalho de fazer horas inteiras porque isso não importa aqui.

--===== Do this in a nice, safe place that everyone has
    USE tempdb
;
--===== Create/Recreate a Test Date table
     IF OBJECT_ID('dbo.TestDate','U') IS NOT NULL
        DROP TABLE dbo.TestDate
;
DECLARE  @StartDate DATETIME
        ,@EndDate   DATETIME
        ,@Rows      INT
;
 SELECT  @StartDate = '2010' --Inclusive
        ,@EndDate   = '2020' --Exclusive
        ,@Rows      = 40000  --Enough to simulate an 8 hour day where I work
;
 SELECT  RowNum       = IDENTITY(INT,1,1)
        ,SomeDateTime = RAND(CHECKSUM(NEWID()))*DATEDIFF(dd,@StartDate,@EndDate)+@StartDate
   INTO dbo.TestDate
   FROM dbo.fnTally(1,@Rows)
;

CONSTRUA ALGUMAS FUNÇÕES PARA FAZER A COISA DAS 12 HORAS LINHAS

A seguir, converti o código rCTE em uma função e criei outras 3 funções. Todos eles foram criados como iTVFs (funções com valor de tabela embutida) de alto desempenho. Você sempre pode dizer porque os iTVFs nunca têm um BEGIN neles como o Scalar ou os mTVFs (Multi-statement Table Valued Functions).

Aqui está o código para construir essas 4 funções... Eu as nomeei de acordo com o método que elas usam e não o que elas fazem apenas para facilitar a identificação delas.

--=====  CREATE THE iTVFs
--===== Do this in a nice, safe place that everyone has
    USE tempdb
;
-----------------------------------------------------------------------------------------
     IF OBJECT_ID('dbo.OriginalrCTE','IF') IS NOT NULL
        DROP FUNCTION dbo.OriginalrCTE
;
GO
 CREATE FUNCTION dbo.OriginalrCTE
        (@Date DATETIME)
RETURNS TABLE WITH SCHEMABINDING AS
 RETURN
WITH Dates AS
(
    SELECT DATEPART(HOUR,DATEADD(HOUR,-1,@Date)) [Hour], 
      DATEADD(HOUR,-1,@Date) [Date], 1 Num
    UNION ALL
    SELECT DATEPART(HOUR,DATEADD(HOUR,-1,[Date])), 
      DATEADD(HOUR,-1,[Date]), Num+1
    FROM Dates
    WHERE Num <= 11
)
SELECT [Hour], [Date]
FROM Dates
GO
-----------------------------------------------------------------------------------------
     IF OBJECT_ID('dbo.MicroTally','IF') IS NOT NULL
        DROP FUNCTION dbo.MicroTally
;
GO
 CREATE FUNCTION dbo.MicroTally
        (@Date DATETIME)
RETURNS TABLE WITH SCHEMABINDING AS
 RETURN
 SELECT  [Hour] = DATEPART(HOUR,DATEADD(HOUR,t.N,@Date))
        ,[DATE] = DATEADD(HOUR,t.N,@Date)
   FROM (VALUES (-1),(-2),(-3),(-4),(-5),(-6),(-7),(-8),(-9),(-10),(-11),(-12))t(N)
;
GO
-----------------------------------------------------------------------------------------
     IF OBJECT_ID('dbo.PhysicalTally','IF') IS NOT NULL
        DROP FUNCTION dbo.PhysicalTally
;
GO
 CREATE FUNCTION dbo.PhysicalTally
        (@Date DATETIME)
RETURNS TABLE WITH SCHEMABINDING AS
 RETURN
 SELECT  [Hour] = DATEPART(HOUR,DATEADD(HOUR,-t.N,@Date))
        ,[DATE] = DATEADD(HOUR,-t.N,@Date)
   FROM dbo.Tally t
  WHERE N BETWEEN 1 AND 12
;
GO
-----------------------------------------------------------------------------------------
     IF OBJECT_ID('dbo.TallyFunction','IF') IS NOT NULL
        DROP FUNCTION dbo.TallyFunction
;
GO
 CREATE FUNCTION dbo.TallyFunction
        (@Date DATETIME)
RETURNS TABLE WITH SCHEMABINDING AS
 RETURN
 SELECT  [Hour] = DATEPART(HOUR,DATEADD(HOUR,-t.N,@Date))
        ,[DATE] = DATEADD(HOUR,-t.N,@Date)
   FROM dbo.fnTally(1,12) t
;
GO

CONSTRUA O ARNÊS DE TESTE PARA TESTAR AS FUNÇÕES

Por último, mas não menos importante, precisamos de um equipamento de teste. Eu faço uma verificação de linha de base e, em seguida, testo cada função de maneira idêntica.

Aqui está o código para o arnês de teste ...

PRINT '--========== Baseline Select =================================';
DECLARE @Hour INT, @Date DATETIME
;
    SET STATISTICS TIME,IO ON;
 SELECT  @Hour = RowNum
        ,@Date = SomeDateTime
   FROM dbo.TestDate
  CROSS APPLY dbo.fnTally(1,12);
    SET STATISTICS TIME,IO OFF;
GO
PRINT '--========== Orginal Recursive CTE ===========================';
DECLARE @Hour INT, @Date DATETIME
;

    SET STATISTICS TIME,IO ON;
 SELECT  @Hour = fn.[Hour]
        ,@Date = fn.[Date]
   FROM dbo.TestDate td
  CROSS APPLY dbo.OriginalrCTE(td.SomeDateTime) fn;
    SET STATISTICS TIME,IO OFF;
GO
PRINT '--========== Dedicated Micro-Tally Table =====================';
DECLARE @Hour INT, @Date DATETIME
;

    SET STATISTICS TIME,IO ON;
 SELECT  @Hour = fn.[Hour]
        ,@Date = fn.[Date]
   FROM dbo.TestDate td
  CROSS APPLY dbo.MicroTally(td.SomeDateTime) fn;
    SET STATISTICS TIME,IO OFF;
GO
PRINT'--========== Physical Tally Table =============================';
DECLARE @Hour INT, @Date DATETIME
;
    SET STATISTICS TIME,IO ON;
 SELECT  @Hour = fn.[Hour]
        ,@Date = fn.[Date]
   FROM dbo.TestDate td
  CROSS APPLY dbo.PhysicalTally(td.SomeDateTime) fn;
    SET STATISTICS TIME,IO OFF;
GO
PRINT'--========== Tally Function ===================================';
DECLARE @Hour INT, @Date DATETIME
;
    SET STATISTICS TIME,IO ON;
 SELECT  @Hour = fn.[Hour]
        ,@Date = fn.[Date]
   FROM dbo.TestDate td
  CROSS APPLY dbo.TallyFunction(td.SomeDateTime) fn;
    SET STATISTICS TIME,IO OFF;
GO

Uma coisa a se notar na estrutura de teste acima é que eu desvio toda a saída para variáveis "descartáveis". Isso é para tentar manter as medições de desempenho o mais puras possível, sem nenhuma saída para os resultados de distorção do disco ou da tela.

UMA PALAVRA DE CUIDADO NAS ESTATÍSTICAS DE SET

Also, a word of caution for would-be testers... You MUST NOT use SET STATISTICS when testing either Scalar or mTVF functions. It can only be safely used on iTVF functions like the ones in this test. SET STATISTICS has been proven to make SCALAR functions run hundreds of times slower than they actually do without it. Yeah, I'm trying to tilt another windmill but that would be a whole 'nuther article-length post and I don't have the time for that. I have an article on SQLServerCentral.com talking all about that but there's no sense in posting the link here because someone will get all bent out of shape about it.

THE TEST RESULTS

So, here are the test results when I run the test harness on my little i5 laptop with 6GB of RAM.

--========== Baseline Select =================================
Table 'Worktable'. Scan count 1, logical reads 82309, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'TestDate'. Scan count 1, logical reads 105, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 203 ms,  elapsed time = 206 ms.
--========== Orginal Recursive CTE ===========================
Table 'Worktable'. Scan count 40001, logical reads 2960000, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'TestDate'. Scan count 1, logical reads 105, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 4258 ms,  elapsed time = 4415 ms.
--========== Dedicated Micro-Tally Table =====================
Table 'Worktable'. Scan count 1, logical reads 81989, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'TestDate'. Scan count 1, logical reads 105, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 234 ms,  elapsed time = 235 ms.
--========== Physical Tally Table =============================
Table 'Worktable'. Scan count 1, logical reads 81989, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'TestDate'. Scan count 1, logical reads 105, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Tally'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 250 ms,  elapsed time = 252 ms.
--========== Tally Function ===================================
Table 'Worktable'. Scan count 1, logical reads 81989, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'TestDate'. Scan count 1, logical reads 105, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 250 ms,  elapsed time = 253 ms.

The "BASELINE SELECT", which only selects data (each row created 12 times to simulate the same volume of return), came in right about 1/5th of a second. Everything else came in at about a quarter of a second. Well, everything except that bloody rCTE function. It took 4 and 1/4 seconds or 16 times longer (1,600% slower).

And look at the logical reads (memory IO)... The rCTE consumed a whopping 2,960,000 (almost 3 MILLION reads) whereas the other functions only consumed about 82,100. That means the rCTE consumed more than 34.3 times more memory IO than any of the other functions.

CLOSING THOUGHTS

Let's summarize. The rCTE method for doing this "small" 12 row thing used 16 TIMES (1,600%) more CPU (and duration) and 34.3 TIMES (3,430%) more memory IO than any of the other functions.

Heh... I know what you're thinking. "Big Deal! It's just one function."

Yeah, agreed, but how many other functions do you have? How many other places outside of functions do you have? And do you have any of those that work with more than just 12 rows each run? And, is there any chance that someone in a lurch for a method might copy that rCTE code for something much bigger?

Ok, time to be blunt. It makes absolutely no sense for people to justify performance challenged code just because of supposed limited row counts or usage. Except for when you purchase an MPP box for perhaps millions of dollars (not to mention the expense of rewriting code to get it to work on such a machine), you can't buy a machine that runs your code 16 times faster (SSD's won't do it either... all this stuff was in high speed memory when we tested it). Performance is in the code. Good performance is in good code.

Can you imagine if all of your code ran "just" 16 times faster?

Never justify bad or performance challenged code on low rowcounts or even low usage. If you do, you might have to borrow one of the windmills I was accused of tilting at to keep your CPUs and disks cool enough. ;-)

A WORD ON THE WORD "TALLY"

Yeah... I agree. Semantically speaking, the Tally Table contains numbers, not "tallies". In my original article on the subject (it wasn't the original article on the technique but it was my first on it), I called it "Tally" not because of what it contains, but because of what it does... it's used to "count" instead of looping and to "Tally" something is to "Count" something. ;-) Call it what you will... Numbers Table, Tally Table, Sequence Table, whatever. I don't care. For me, "Tally" is more meaning full and, being a good lazy DBA, contains only 5 letters (2 are identical) instead of 7 and it's easier to say for most folks. It's also "singular", which follows my naming convention for tables. ;-) It's also what the article that contained a page from a book from the 60's called it. I'll always refer to it as a "Tally Table" and you'll still know what I or someone else means. I also avoid Hungarian Notation like the plague but called the function "fnTally" so that I could say "Well, if you used the eff-en Tally Function I showed you, you wouldn't have a performance problem" without it actually being an HR violation. ;-)

What I'm more concerned about is people learning to use it properly instead of resorting to things like performance challenged rCTEs and other forms of Hidden RBAR.

Leigh Riffel · Answer 4 · 2012-03-09T10:06:08+08:00

Leigh Riffel

2012-03-09T10:06:08+08:002012-03-09T10:06:08+08:00

Você precisará de RIGHT JOINseus dados com uma consulta que retorne um registro para cada hora necessária.

Veja algumas maneiras de obter números de linha que você pode subtrair como horas da hora atual .

No Oracle, uma consulta hierárquica em dual gerará linhas:

SELECT to_char(sysdate-level/24,'HH24') FROM dual CONNECT BY Level <=24;

2

Você conhece uma maneira fácil de gerar um registro para cada hora das últimas 12 horas?

Como ver a lista de bancos de dados no Oracle?

Quão grande deve ser o mysql innodb_buffer_pool_size?

Listar todas as colunas de uma tabela especificada

restaurar a tabela do arquivo .frm e .ibd?

Como usar o sqlplus para se conectar a um banco de dados Oracle localizado em outro host sem modificar meu próprio tnsnames.ora

Como você mysqldump tabela (s) específica (s)?

Como selecionar a primeira linha de cada grupo?

Listar os privilégios do banco de dados usando o psql

Como inserir valores em uma tabela de uma consulta de seleção no PostgreSQL?

Como faço para listar todos os bancos de dados e tabelas usando o psql?

Você conhece uma maneira fácil de gerar um registro para cada hora das últimas 12 horas?

4 respostas

relate perguntas