Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 Fill in missing dates from SUM'd results

Author  Topic 

R
Constraint Violating Yak Guru

328 Posts

Posted - 2012-07-25 : 03:04:30
I have a query that performs a SUM on a table variable to show the number of user accounts created each month:


SELECT
CAST(YEAR(t.dateCreated) as nvarchar) + '-' + CAST(RIGHT('0' + RTRIM(MONTH(t.dateCreated)), 2) as nvarchar) as colName, -- needs to be 'YYYY-MM' so that ASP.NET can sort as a string
COUNT(*) AS totals
FROM
@tbl_results t
GROUP BY
DATENAME(MONTH, t.dateCreated) + ' ' + CAST(YEAR(t.dateCreated) AS nvarchar), MONTH(t.dateCreated), YEAR(t.dateCreated)


The results look like this:


colName totals
2011-10 524
2011-11 544
2011-12 234
2012-01 102
2012-03 678


Because no users were created in February 2012 (2012-02) that month gets missed of. How could I expand the query please to provide a result like this?


colName totals
2011-10 524
2011-11 544
2011-12 234
2012-01 102
2012-02 0
2012-03 678

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2012-07-25 : 03:28:36
[code];WITH cteMonths(FromMonth, ToMonth)
AS (
SELECT MIN(DateCreated),
MAX(DateCreated)
FROM @Tbl_Results

UNION ALL

SELECT DATEADD(MONTH, 1, FromMonth),
ToMonth
FROM cteMonths
WHERE DATEDIFF(MONTH, FromMonth, ToMonth) >= 0
)
SELECT CONVERT(CHAR(7), m.FromMonth, 120) AS ColName,
COUNT(t.DateCreated) AS Totals
FROM cteMonths AS m
LEFT JOIN @Tbl_Results AS t
GROUP BY CONVERT(CHAR(7), m.FromMonth, 120)
ORDER BY CONVERT(CHAR(7), m.FromMonth, 120)[/code]

N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

webfred
Master Smack Fu Yak Hacker

8781 Posts

Posted - 2012-07-25 : 03:29:28
You need a table with all the wanted dates then left join your @tbl_results and each time a value is NULL because there is nothing to join you can display zero as totals.


No, you're never too old to Yak'n'Roll if you're too young to die.

Go to Top of Page

R
Constraint Violating Yak Guru

328 Posts

Posted - 2012-07-25 : 03:57:08
quote:
You need a table with all the wanted dates then left join your @tbl_results and each time a value is NULL because there is nothing to join you can display zero as totals.


Okay I think I can manage that using the earliest and latest dates from the results. I'll have a go...
Thanks.

quote:
;WITH cteMonths(FromMonth, ToMonth)
AS (
SELECT MIN(DateCreated),
MAX(DateCreated)
FROM @Tbl_Results

UNION ALL

SELECT DATEADD(MONTH, 1, FromMonth),
ToMonth
FROM cteMonths
WHERE DATEDIFF(MONTH, FromMonth, ToMonth) >= 0
)
SELECT CONVERT(CHAR(7), m.FromMonth, 120) AS ColName,
COUNT(t.DateCreated) AS Totals
FROM cteMonths AS m
LEFT JOIN @Tbl_Results AS t
GROUP BY CONVERT(CHAR(7), m.FromMonth, 120)
ORDER BY CONVERT(CHAR(7), m.FromMonth, 120)


Hi SwePeso, unfortunately this comes back with Incorrect syntax near the keyword 'GROUP'.
Go to Top of Page

webfred
Master Smack Fu Yak Hacker

8781 Posts

Posted - 2012-07-25 : 04:26:14
quote:
Originally posted by SwePeso

;WITH cteMonths(FromMonth, ToMonth)
AS (
SELECT MIN(DateCreated),
MAX(DateCreated)
FROM @Tbl_Results

UNION ALL

SELECT DATEADD(MONTH, 1, FromMonth),
ToMonth
FROM cteMonths
WHERE DATEDIFF(MONTH, FromMonth, ToMonth) >= 0
)
SELECT CONVERT(CHAR(7), m.FromMonth, 120) AS ColName,
COUNT(t.DateCreated) AS Totals
FROM cteMonths AS m
LEFT JOIN @Tbl_Results AS t ON dateadd(d,datediff(d,0,t.DateCreated),0) = dateadd(d,datediff(d,0,m.FromMonth),0)
GROUP BY CONVERT(CHAR(7), m.FromMonth, 120)
ORDER BY CONVERT(CHAR(7), m.FromMonth, 120)


N 56°04'39.26"
E 12°55'05.63"



I think the red part will fix it - give it a try.


No, you're never too old to Yak'n'Roll if you're too young to die.
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2012-07-25 : 04:33:45
This will perform faster
;WITH cteMonths(FromMonth, ToMonth, Display)
AS (
SELECT MIN(DateCreated),
MAX(DateCreated),
CONVERT(CHAR(7), MIN(DateCreated), 120)
FROM @Tbl_Results

UNION ALL

SELECT DATEADD(MONTH, 1, FromMonth),
ToMonth,
CONVERT(CHAR(7), DATEADD(MONTH, 1, FromMonth), 120)
FROM cteMonths
WHERE DATEDIFF(MONTH, FromMonth, ToMonth) >= 0
)
SELECT m.Display AS ColName,
ISNULL(t.Items, 0) AS Totals
FROM cteMonths AS m
LEFT JOIN (
SELECT CONVERT(CHAR(7), DateCreated, 120) AS Display,
COUNT(*) AS Items
FROM @Tbl_Results
GROUP BY CONVERT(CHAR(7), DateCreated, 120)
) AS t ON t.Display = m.Display
ORDER BY m.Display



N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

R
Constraint Violating Yak Guru

328 Posts

Posted - 2012-07-25 : 04:56:50
Wow. Nice work SwePeso!
Works like a charm. Now the job of figuring out what you've done there... I don't suppose you could enlighten me with a few comments please?!?
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2012-07-25 : 05:20:30
First I create a recursive cte that will return all months between (and including) the earliest and latest month.
The recursrive CTE has three columns; the "iterative" part (FromMonth), the stop part (ToMonth) and the preformatted part (Display).

I use this recursive CTE to outer join the dataset in the table variable. To speed up things I preaggregate the data in the table variable so that I do not have to do that many joins. Now I only join once per month.

Since I do a outer join, those months not present in the table variable will return a NULL values, which I convert to 0 with the ISNULL function.



N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2012-07-25 : 05:25:36
This will perform even faster
;WITH cteMonths(theMonth, stopMonth)
AS (
SELECT DATEDIFF(MONTH, '19000101', MIN(DateCreated)) AS theMonth,
DATEDIFF(MONTH, '19000101', MAX(DateCreated)) AS stopMonth
FROM @Tbl_Results

UNION ALL

SELECT theMonth + 1 AS theMonth,
stopMonth
FROM cteMonths
WHERE theMonth < stopMonth
)
SELECT CONVERT(CHAR(7), DATEADD(MONTH, m.theMonth, '19000101'), 120) AS ColName,
ISNULL(t.Items, 0) AS Totals
FROM cteMonths AS m
LEFT JOIN (
SELECT DATEDIFF(MONTH, '19000101', DateCreated) AS theMonth,
COUNT(*) AS Items
FROM @Tbl_Results
GROUP BY DATEDIFF(MONTH, '19000101', DateCreated)
) AS t ON t.theMonth = m.theMonth
ORDER BY m.theMonth



N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

R
Constraint Violating Yak Guru

328 Posts

Posted - 2012-07-25 : 05:36:05
Wow again. Thank you - that is impressive.
While you're rocking and rolling, I might as well add in an extra challenge...!

Previously I also had an extra column of results that provided a running total alongside the monthly totals, like this:

colName totals runningTotals
2006-04 2 2
2006-05 0 2
2006-06 0 2
2006-07 0 2
2006-08 5 7
2006-09 0 7
2006-10 3 10

This was done using CROSS APPLY. Looking at your new logic, I can't see how this could be applied, because it seems that statement that applies the COUNT would need to be put into another CTE. Could you please show me how that would be done to achieve the above, if it is indeed possible or the correct approach?

Thank you again.
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2012-07-25 : 06:44:30
Try to apply set-based thinking. That is; touch each row as few times as possible.
SET NOCOUNT ON;

CREATE TABLE #Data
(
colName CHAR(7) NOT NULL,
theMonth INT NOT NULL,
totals INT NOT NULL
);

CREATE UNIQUE INDEX UX_Data ON #Data (theMonth) WITH (IGNORE_DUP_KEY = ON);

WITH cteSource(theMonth, totals)
AS (
SELECT DATEDIFF(MONTH, '19000101', DateCreated) AS theMonth,
COUNT(*) AS totals
FROM @Tbl_Results
GROUP BY DATEDIFF(MONTH, '19000101', DateCreated)
)
INSERT #Data
(
colName,
theMonth,
totals
)
SELECT CONVERT(CHAR(7), DATEADD(MONTH, theMonth, '19000101'), 120) AS colName,
theMonth,
totals
FROM cteSource;

WITH cteMonths(theMonth, stopMonth)
AS (
SELECT MIN(theMonth) AS theMonth,
MAX(theMonth) AS stopMonth
FROM #Data

UNION ALL

SELECT theMonth + 1 AS theMonth,
stopMonth
FROM cteMonths
WHERE theMonth < stopMonth
)
INSERT #Data
(
colName,
theMonth,
totals
)
SELECT CONVERT(CHAR(7), DATEADD(MONTH, theMonth, '19000101'), 120) AS colName,
theMonth,
0 AS totals
FROM cteMonths;

SELECT d.colName,
d.totals,
f.runningTotals
FROM #Data AS d
CROSS APPLY (
SELECT SUM(x.totals)
FROM #Data AS x
WHERE x.theMonth <= d.theMonth
) AS f(runningTotals)
ORDER BY d.theMonth;

DROP TABLE #Data;
Please report back about the time this code takes compared to your original code.


N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

R
Constraint Violating Yak Guru

328 Posts

Posted - 2012-07-25 : 07:33:28
Okay thanks very much. With the new logic, the execution time has risen to 9 seconds (previously 1 second). I used a table variable though instead of the temporary table, which means I couldn't use the index.
Because your previous method executed so well, I was wondering if #Data (or @tbl_data in my modified version) could be created to hold the initial counts, and then that gets re-used with CROSS APPLY to produce the running total? What do you think?
Go to Top of Page

R
Constraint Violating Yak Guru

328 Posts

Posted - 2012-07-25 : 07:38:21
Hmmm that statement about execution time might not be true. For some reason ALL my tests are taking 9 seconds.
I'll just do a reboot and try again...
Go to Top of Page

R
Constraint Violating Yak Guru

328 Posts

Posted - 2012-07-25 : 08:38:07
Not sure what that was about. Query times are back to normal now...

So I've tried out two versions of the query, running each one five times. The first query is my version based on all of your input, but breaking down the query into two parts to allow the results to be used within the CROSS APPLY statement:

DECLARE @tbl_data TABLE
(
colName nvarchar(50),
totals int
)
;WITH cteMonths(theMonth, stopMonth) AS
( -- this CTE gets the months list
SELECT
DATEDIFF(MONTH, '19000101', MIN(DateCreated)) AS theMonth,
DATEDIFF(MONTH, '19000101', MAX(DateCreated)) AS stopMonth
FROM
@tbl_Results
UNION ALL
SELECT
theMonth + 1 AS theMonth,
stopMonth
FROM
cteMonths
WHERE
(theMonth < stopMonth)
)
INSERT INTO @tbl_data
(
colName,
totals
)
SELECT
CONVERT(CHAR(7), DATEADD(MONTH, m.theMonth, '19000101'), 120) AS colName,
ISNULL(t.Items, 0) AS totals
FROM
cteMonths AS m LEFT JOIN
(
SELECT
DATEDIFF(MONTH, '19000101', DateCreated) AS theMonth,
COUNT(*) AS Items
FROM
@tbl_Results
GROUP BY
DATEDIFF(MONTH, '19000101', DateCreated)) AS t ON t.theMonth = m.theMonth
ORDER BY
m.theMonth

;WITH cteFinal(colName, totals) AS
(
SELECT
colName,
totals
FROM
@tbl_data
)
SELECT
*
FROM
cteFinal c1 CROSS APPLY
(
SELECT
SUM(totals) as ct
FROM
cteFinal c2
WHERE
c2.colName <= c1.colName
) b
ORDER BY
c1.colName


Which results in this:


And the second version is your latest post, but using a table variable instead of temp table (which I appreciate loses the table index), included for reference:

DECLARE @tbl_data TABLE
(
colName CHAR(7) NOT NULL,
theMonth INT NOT NULL,
totals INT NOT NULL
);
WITH cteSource(theMonth, totals)
AS (
SELECT DATEDIFF(MONTH, '19000101', DateCreated) AS theMonth,
COUNT(*) AS totals
FROM @Tbl_Results
GROUP BY DATEDIFF(MONTH, '19000101', DateCreated)
)
INSERT INTO @tbl_data
(
colName,
theMonth,
totals
)
SELECT CONVERT(CHAR(7), DATEADD(MONTH, theMonth, '19000101'), 120) AS colName,
theMonth,
totals
FROM cteSource;
WITH cteMonths(theMonth, stopMonth)
AS (
SELECT MIN(theMonth) AS theMonth,
MAX(theMonth) AS stopMonth
FROM @tbl_data

UNION ALL

SELECT theMonth + 1 AS theMonth,
stopMonth
FROM cteMonths
WHERE theMonth < stopMonth
)
INSERT @tbl_data
(
colName,
theMonth,
totals
)
SELECT CONVERT(CHAR(7), DATEADD(MONTH, theMonth, '19000101'), 120) AS colName,
theMonth,
0 AS totals
FROM cteMonths;

SELECT d.colName,
d.totals,
f.runningTotals
FROM @tbl_data AS d
CROSS APPLY (
SELECT SUM(x.totals)
FROM @tbl_data AS x
WHERE x.theMonth <= d.theMonth
) AS f(runningTotals)
ORDER BY d.theMonth


The results show this:


So... I guess there's not much in it, unless I've missed something obvious?
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2012-07-25 : 08:48:21
Well... My code only has two inserts and one select, so there must be A LOT of other code going on.
I can't tell what is going on with those.



N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

R
Constraint Violating Yak Guru

328 Posts

Posted - 2012-07-25 : 09:22:41
Okay sorry.

The overall stored procedure loads a list of users and displays their details. Based on parameter choices, the report displays a list of individuals, or groups them (in the case above, group by the date they were created (month). Before either of the two approaches above are performed, the stored procedure does the following:

1. Check the access level of the user executing the query
2. Loads the access levels of the users to look for (e.g. admins only, or all access levels)
3. Loads all users that the administrator has permission to view
4. Takes those users and joins other tables to find more details (e.g. department, site, customer name)
5. Based on parameter choice, displays the results

I'm not sure how much more help you are able to offer without seeing all the code. I've added in the Execution Plans for info if that helps.

First attempt using my two-part approach:
http://www.4shared.com/get/rCN4x6Tn/001.html

Second attempt with your final method:
http://www.4shared.com/get/zXmDPdhI/002.html

I really appreciate your help.

Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2012-07-25 : 09:24:36
Table variables cannot utilize parallellism other than SELECT FROM...



N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2012-07-25 : 09:36:28
[code]-- Prevent unwanted resultsets back to client
SET NOCOUNT ON;

-- Prepare staging table
CREATE TABLE #Data
(
theMonth INT PRIMARY KEY CLUSTERED,
totals INT NOT NULL
);

-- Populate staging table with all months
WITH cteMonths(theMonth, stopMonth)
AS (
-- This anchor query should have the same filters as your query which populate @Tbl_Results table
SELECT DATEDIFF(MONTH, '19000101', MIN(DateCreated)) AS theMonth,
DATEDIFF(MONTH, '19000101', MAX(DateCreated)) AS stopMonth
FROM {Your schema name here}.{Your source table name here}

UNION ALL

SELECT theMonth + 1 AS theMonth,
stopMonth
FROM cteMonths
WHERE theMonth < stopMonth
)
INSERT #Data
(
theMonth,
totals
)
SELECT theMonth,
0 AS totals
FROM cteMonths
ORDER BY theMonth;

-- Update staging table with total count for each month
UPDATE d
SET d.totals = w.totals
FROM #Data AS d
INNER JOIN (
-- This aggregated query should have the same filters as your query which populate @Tbl_Results table
SELECT DATEDIFF(MONTH, '19000101', DateCreated) AS theMonth,
COUNT(*) AS totals
FROM {Your schema name here}.{Your source table name here}
GROUP BY DATEDIFF(MONTH, '19000101', DateCreated)
) AS w ON w.theMonth = d.theMonth;

-- Display the final result including a running total
SELECT CONVERT(CHAR(7), DATEADD(MONTH, d.theMonth, '19000101'), 120) AS colName,
d.totals,
f.runningTotals
FROM #Data AS d
CROSS APPLY (
SELECT SUM(x.totals)
FROM #Data AS x
WHERE x.theMonth <= d.theMonth
) AS f(runningTotals)
ORDER BY d.theMonth;

-- Clean up
DROP TABLE #Data;[/code]


N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

R
Constraint Violating Yak Guru

328 Posts

Posted - 2012-07-25 : 10:02:51
Not sure why, but on running this, the average Total Execution Time in Client Statistics is showing in seconds, rather than MS. Anyway the execution time is about the same as before.
http://www.4shared.com/get/-vEhdGHO/003.html
I guess that's about as efficient as its going to get?!?
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2012-07-25 : 10:09:00
I am running the same query above on a 100 000 000 row table (100 million rows).
The complete code takes 6 seconds.
-- Prevent unwanted resultsets back to client
SET NOCOUNT ON;

/*
SELECT COUNT(*)
FROM Sales.factSalesDetail
-- 100 760 501 row(s) affected
*/

-- Prepare staging table
CREATE TABLE #Data
(
theMonth INT PRIMARY KEY CLUSTERED,
totals INT NOT NULL
);

-- Populate staging table with all months
WITH cteMonths(theMonth, stopMonth)
AS (
SELECT DATEDIFF(MONTH, '19000101', MIN(ssdLastKnownDate)) AS theMonth,
DATEDIFF(MONTH, '19000101', MAX(ssdLastKnownDate)) AS stopMonth
FROM Sales.factSalesDetail

UNION ALL

SELECT theMonth + 1 AS theMonth,
stopMonth
FROM cteMonths
WHERE theMonth < stopMonth
)
INSERT #Data
(
theMonth,
totals
)
SELECT theMonth,
0 AS totals
FROM cteMonths
ORDER BY theMonth
OPTION (MAXRECURSION 0);

-- Update staging table with total count for each month
UPDATE d
SET d.totals = w.totals
FROM #Data AS d
INNER JOIN (
-- This aggregated query should have the same filters as your query which populate @Tbl_Results table
SELECT DATEDIFF(MONTH, '19000101', ssdLastKnownDate) AS theMonth,
COUNT(*) AS totals
FROM Sales.factSalesDetail
GROUP BY DATEDIFF(MONTH, '19000101', ssdLastKnownDate)
) AS w ON w.theMonth = d.theMonth;

-- Display the final result including a running total
SELECT CONVERT(CHAR(7), DATEADD(MONTH, d.theMonth, '19000101'), 120) AS colName,
d.totals,
f.runningTotals
FROM #Data AS d
CROSS APPLY (
SELECT SUM(x.totals)
FROM #Data AS x
WHERE x.theMonth <= d.theMonth
) AS f(runningTotals)
ORDER BY d.theMonth;

-- Clean up
DROP TABLE #Data;




N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2012-07-25 : 10:10:57
Here is the statistics for the above query

Client Execution Time 16:10:05 16:09:57 16:09:35
Query Profile Statistics
Number of INSERT, DELETE and UPDATE statements 2 2 2 2.0000
Rows affected by INSERT, DELETE, or UPDATE statements 0 0 0 0.0000
Number of SELECT statements 1 1 1 1.0000
Rows returned by SELECT statements 0 0 0 0.0000
Number of transactions 2 2 2 2.0000
Network Statistics
Number of server roundtrips 1 1 1 1.0000
TDS packets sent from client 1 1 1 1.0000
TDS packets received from server 7 7 7 7.0000
Bytes sent from client 3266 3266 3266 3266.0000
Bytes received from server 25895 25895 25895 25895.0000
Time Statistics
Client processing time 109 109 110 109.3333
Total execution time 6535 6457 6442 6478.0000
Wait time on server replies 6426 6348 6332 6368.6670



N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page
    Next Page

- Advertisement -