Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
R
Constraint Violating Yak Guru
328 Posts |
Posted - 2012-07-25 : 03:04:30
|
I have a query that performs a SUM on a table variable to show the number of user accounts created each month:SELECT CAST(YEAR(t.dateCreated) as nvarchar) + '-' + CAST(RIGHT('0' + RTRIM(MONTH(t.dateCreated)), 2) as nvarchar) as colName, -- needs to be 'YYYY-MM' so that ASP.NET can sort as a string COUNT(*) AS totalsFROM @tbl_results tGROUP BY DATENAME(MONTH, t.dateCreated) + ' ' + CAST(YEAR(t.dateCreated) AS nvarchar), MONTH(t.dateCreated), YEAR(t.dateCreated) The results look like this:colName totals2011-10 5242011-11 5442011-12 2342012-01 1022012-03 678 Because no users were created in February 2012 (2012-02) that month gets missed of. How could I expand the query please to provide a result like this?colName totals2011-10 5242011-11 5442011-12 2342012-01 1022012-02 02012-03 678 |
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2012-07-25 : 03:28:36
|
[code];WITH cteMonths(FromMonth, ToMonth)AS ( SELECT MIN(DateCreated), MAX(DateCreated) FROM @Tbl_Results UNION ALL SELECT DATEADD(MONTH, 1, FromMonth), ToMonth FROM cteMonths WHERE DATEDIFF(MONTH, FromMonth, ToMonth) >= 0)SELECT CONVERT(CHAR(7), m.FromMonth, 120) AS ColName, COUNT(t.DateCreated) AS TotalsFROM cteMonths AS mLEFT JOIN @Tbl_Results AS tGROUP BY CONVERT(CHAR(7), m.FromMonth, 120)ORDER BY CONVERT(CHAR(7), m.FromMonth, 120)[/code] N 56°04'39.26"E 12°55'05.63" |
|
|
webfred
Master Smack Fu Yak Hacker
8781 Posts |
Posted - 2012-07-25 : 03:29:28
|
You need a table with all the wanted dates then left join your @tbl_results and each time a value is NULL because there is nothing to join you can display zero as totals. No, you're never too old to Yak'n'Roll if you're too young to die. |
|
|
R
Constraint Violating Yak Guru
328 Posts |
Posted - 2012-07-25 : 03:57:08
|
quote: You need a table with all the wanted dates then left join your @tbl_results and each time a value is NULL because there is nothing to join you can display zero as totals.
Okay I think I can manage that using the earliest and latest dates from the results. I'll have a go...Thanks.quote: ;WITH cteMonths(FromMonth, ToMonth)AS ( SELECT MIN(DateCreated), MAX(DateCreated) FROM @Tbl_Results UNION ALL SELECT DATEADD(MONTH, 1, FromMonth), ToMonth FROM cteMonths WHERE DATEDIFF(MONTH, FromMonth, ToMonth) >= 0)SELECT CONVERT(CHAR(7), m.FromMonth, 120) AS ColName, COUNT(t.DateCreated) AS TotalsFROM cteMonths AS mLEFT JOIN @Tbl_Results AS tGROUP BY CONVERT(CHAR(7), m.FromMonth, 120)ORDER BY CONVERT(CHAR(7), m.FromMonth, 120)
Hi SwePeso, unfortunately this comes back with Incorrect syntax near the keyword 'GROUP'. |
|
|
webfred
Master Smack Fu Yak Hacker
8781 Posts |
Posted - 2012-07-25 : 04:26:14
|
quote: Originally posted by SwePeso
;WITH cteMonths(FromMonth, ToMonth)AS ( SELECT MIN(DateCreated), MAX(DateCreated) FROM @Tbl_Results UNION ALL SELECT DATEADD(MONTH, 1, FromMonth), ToMonth FROM cteMonths WHERE DATEDIFF(MONTH, FromMonth, ToMonth) >= 0)SELECT CONVERT(CHAR(7), m.FromMonth, 120) AS ColName, COUNT(t.DateCreated) AS TotalsFROM cteMonths AS mLEFT JOIN @Tbl_Results AS t ON dateadd(d,datediff(d,0,t.DateCreated),0) = dateadd(d,datediff(d,0,m.FromMonth),0)GROUP BY CONVERT(CHAR(7), m.FromMonth, 120)ORDER BY CONVERT(CHAR(7), m.FromMonth, 120) N 56°04'39.26"E 12°55'05.63"
I think the red part will fix it - give it a try. No, you're never too old to Yak'n'Roll if you're too young to die. |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2012-07-25 : 04:33:45
|
This will perform faster;WITH cteMonths(FromMonth, ToMonth, Display)AS ( SELECT MIN(DateCreated), MAX(DateCreated), CONVERT(CHAR(7), MIN(DateCreated), 120) FROM @Tbl_Results UNION ALL SELECT DATEADD(MONTH, 1, FromMonth), ToMonth, CONVERT(CHAR(7), DATEADD(MONTH, 1, FromMonth), 120) FROM cteMonths WHERE DATEDIFF(MONTH, FromMonth, ToMonth) >= 0)SELECT m.Display AS ColName, ISNULL(t.Items, 0) AS TotalsFROM cteMonths AS mLEFT JOIN ( SELECT CONVERT(CHAR(7), DateCreated, 120) AS Display, COUNT(*) AS Items FROM @Tbl_Results GROUP BY CONVERT(CHAR(7), DateCreated, 120) ) AS t ON t.Display = m.DisplayORDER BY m.Display N 56°04'39.26"E 12°55'05.63" |
|
|
R
Constraint Violating Yak Guru
328 Posts |
Posted - 2012-07-25 : 04:56:50
|
Wow. Nice work SwePeso!Works like a charm. Now the job of figuring out what you've done there... I don't suppose you could enlighten me with a few comments please?!? |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2012-07-25 : 05:20:30
|
First I create a recursive cte that will return all months between (and including) the earliest and latest month.The recursrive CTE has three columns; the "iterative" part (FromMonth), the stop part (ToMonth) and the preformatted part (Display).I use this recursive CTE to outer join the dataset in the table variable. To speed up things I preaggregate the data in the table variable so that I do not have to do that many joins. Now I only join once per month.Since I do a outer join, those months not present in the table variable will return a NULL values, which I convert to 0 with the ISNULL function. N 56°04'39.26"E 12°55'05.63" |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2012-07-25 : 05:25:36
|
This will perform even faster;WITH cteMonths(theMonth, stopMonth)AS ( SELECT DATEDIFF(MONTH, '19000101', MIN(DateCreated)) AS theMonth, DATEDIFF(MONTH, '19000101', MAX(DateCreated)) AS stopMonth FROM @Tbl_Results UNION ALL SELECT theMonth + 1 AS theMonth, stopMonth FROM cteMonths WHERE theMonth < stopMonth)SELECT CONVERT(CHAR(7), DATEADD(MONTH, m.theMonth, '19000101'), 120) AS ColName, ISNULL(t.Items, 0) AS TotalsFROM cteMonths AS mLEFT JOIN ( SELECT DATEDIFF(MONTH, '19000101', DateCreated) AS theMonth, COUNT(*) AS Items FROM @Tbl_Results GROUP BY DATEDIFF(MONTH, '19000101', DateCreated) ) AS t ON t.theMonth = m.theMonthORDER BY m.theMonth N 56°04'39.26"E 12°55'05.63" |
|
|
R
Constraint Violating Yak Guru
328 Posts |
Posted - 2012-07-25 : 05:36:05
|
Wow again. Thank you - that is impressive.While you're rocking and rolling, I might as well add in an extra challenge...!Previously I also had an extra column of results that provided a running total alongside the monthly totals, like this:colName totals runningTotals2006-04 2 22006-05 0 22006-06 0 22006-07 0 22006-08 5 72006-09 0 72006-10 3 10 This was done using CROSS APPLY. Looking at your new logic, I can't see how this could be applied, because it seems that statement that applies the COUNT would need to be put into another CTE. Could you please show me how that would be done to achieve the above, if it is indeed possible or the correct approach? Thank you again. |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2012-07-25 : 06:44:30
|
Try to apply set-based thinking. That is; touch each row as few times as possible.SET NOCOUNT ON;CREATE TABLE #Data ( colName CHAR(7) NOT NULL, theMonth INT NOT NULL, totals INT NOT NULL );CREATE UNIQUE INDEX UX_Data ON #Data (theMonth) WITH (IGNORE_DUP_KEY = ON);WITH cteSource(theMonth, totals)AS ( SELECT DATEDIFF(MONTH, '19000101', DateCreated) AS theMonth, COUNT(*) AS totals FROM @Tbl_Results GROUP BY DATEDIFF(MONTH, '19000101', DateCreated))INSERT #Data ( colName, theMonth, totals )SELECT CONVERT(CHAR(7), DATEADD(MONTH, theMonth, '19000101'), 120) AS colName, theMonth, totalsFROM cteSource;WITH cteMonths(theMonth, stopMonth)AS ( SELECT MIN(theMonth) AS theMonth, MAX(theMonth) AS stopMonth FROM #Data UNION ALL SELECT theMonth + 1 AS theMonth, stopMonth FROM cteMonths WHERE theMonth < stopMonth)INSERT #Data ( colName, theMonth, totals )SELECT CONVERT(CHAR(7), DATEADD(MONTH, theMonth, '19000101'), 120) AS colName, theMonth, 0 AS totalsFROM cteMonths;SELECT d.colName, d.totals, f.runningTotalsFROM #Data AS dCROSS APPLY ( SELECT SUM(x.totals) FROM #Data AS x WHERE x.theMonth <= d.theMonth ) AS f(runningTotals)ORDER BY d.theMonth;DROP TABLE #Data; Please report back about the time this code takes compared to your original code. N 56°04'39.26"E 12°55'05.63" |
|
|
R
Constraint Violating Yak Guru
328 Posts |
Posted - 2012-07-25 : 07:33:28
|
Okay thanks very much. With the new logic, the execution time has risen to 9 seconds (previously 1 second). I used a table variable though instead of the temporary table, which means I couldn't use the index.Because your previous method executed so well, I was wondering if #Data (or @tbl_data in my modified version) could be created to hold the initial counts, and then that gets re-used with CROSS APPLY to produce the running total? What do you think? |
|
|
R
Constraint Violating Yak Guru
328 Posts |
Posted - 2012-07-25 : 07:38:21
|
Hmmm that statement about execution time might not be true. For some reason ALL my tests are taking 9 seconds.I'll just do a reboot and try again... |
|
|
R
Constraint Violating Yak Guru
328 Posts |
Posted - 2012-07-25 : 08:38:07
|
Not sure what that was about. Query times are back to normal now...So I've tried out two versions of the query, running each one five times. The first query is my version based on all of your input, but breaking down the query into two parts to allow the results to be used within the CROSS APPLY statement:DECLARE @tbl_data TABLE ( colName nvarchar(50), totals int ) ;WITH cteMonths(theMonth, stopMonth) AS ( -- this CTE gets the months list SELECT DATEDIFF(MONTH, '19000101', MIN(DateCreated)) AS theMonth, DATEDIFF(MONTH, '19000101', MAX(DateCreated)) AS stopMonth FROM @tbl_Results UNION ALL SELECT theMonth + 1 AS theMonth, stopMonth FROM cteMonths WHERE (theMonth < stopMonth) )INSERT INTO @tbl_data ( colName, totals )SELECT CONVERT(CHAR(7), DATEADD(MONTH, m.theMonth, '19000101'), 120) AS colName, ISNULL(t.Items, 0) AS totalsFROM cteMonths AS m LEFT JOIN ( SELECT DATEDIFF(MONTH, '19000101', DateCreated) AS theMonth, COUNT(*) AS Items FROM @tbl_Results GROUP BY DATEDIFF(MONTH, '19000101', DateCreated)) AS t ON t.theMonth = m.theMonthORDER BY m.theMonth;WITH cteFinal(colName, totals) AS ( SELECT colName, totals FROM @tbl_data )SELECT *FROM cteFinal c1 CROSS APPLY ( SELECT SUM(totals) as ct FROM cteFinal c2 WHERE c2.colName <= c1.colName ) bORDER BY c1.colName Which results in this:And the second version is your latest post, but using a table variable instead of temp table (which I appreciate loses the table index), included for reference:DECLARE @tbl_data TABLE ( colName CHAR(7) NOT NULL, theMonth INT NOT NULL, totals INT NOT NULL );WITH cteSource(theMonth, totals)AS ( SELECT DATEDIFF(MONTH, '19000101', DateCreated) AS theMonth, COUNT(*) AS totals FROM @Tbl_Results GROUP BY DATEDIFF(MONTH, '19000101', DateCreated))INSERT INTO @tbl_data ( colName, theMonth, totals )SELECT CONVERT(CHAR(7), DATEADD(MONTH, theMonth, '19000101'), 120) AS colName, theMonth, totalsFROM cteSource;WITH cteMonths(theMonth, stopMonth)AS ( SELECT MIN(theMonth) AS theMonth, MAX(theMonth) AS stopMonth FROM @tbl_data UNION ALL SELECT theMonth + 1 AS theMonth, stopMonth FROM cteMonths WHERE theMonth < stopMonth)INSERT @tbl_data ( colName, theMonth, totals )SELECT CONVERT(CHAR(7), DATEADD(MONTH, theMonth, '19000101'), 120) AS colName, theMonth, 0 AS totalsFROM cteMonths;SELECT d.colName, d.totals, f.runningTotalsFROM @tbl_data AS dCROSS APPLY ( SELECT SUM(x.totals) FROM @tbl_data AS x WHERE x.theMonth <= d.theMonth ) AS f(runningTotals)ORDER BY d.theMonth The results show this:So... I guess there's not much in it, unless I've missed something obvious? |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2012-07-25 : 08:48:21
|
Well... My code only has two inserts and one select, so there must be A LOT of other code going on.I can't tell what is going on with those. N 56°04'39.26"E 12°55'05.63" |
|
|
R
Constraint Violating Yak Guru
328 Posts |
Posted - 2012-07-25 : 09:22:41
|
Okay sorry.The overall stored procedure loads a list of users and displays their details. Based on parameter choices, the report displays a list of individuals, or groups them (in the case above, group by the date they were created (month). Before either of the two approaches above are performed, the stored procedure does the following:1. Check the access level of the user executing the query2. Loads the access levels of the users to look for (e.g. admins only, or all access levels)3. Loads all users that the administrator has permission to view4. Takes those users and joins other tables to find more details (e.g. department, site, customer name)5. Based on parameter choice, displays the resultsI'm not sure how much more help you are able to offer without seeing all the code. I've added in the Execution Plans for info if that helps.First attempt using my two-part approach:http://www.4shared.com/get/rCN4x6Tn/001.htmlSecond attempt with your final method:http://www.4shared.com/get/zXmDPdhI/002.htmlI really appreciate your help. |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2012-07-25 : 09:24:36
|
Table variables cannot utilize parallellism other than SELECT FROM... N 56°04'39.26"E 12°55'05.63" |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2012-07-25 : 09:36:28
|
[code]-- Prevent unwanted resultsets back to clientSET NOCOUNT ON;-- Prepare staging tableCREATE TABLE #Data ( theMonth INT PRIMARY KEY CLUSTERED, totals INT NOT NULL );-- Populate staging table with all monthsWITH cteMonths(theMonth, stopMonth)AS ( -- This anchor query should have the same filters as your query which populate @Tbl_Results table SELECT DATEDIFF(MONTH, '19000101', MIN(DateCreated)) AS theMonth, DATEDIFF(MONTH, '19000101', MAX(DateCreated)) AS stopMonth FROM {Your schema name here}.{Your source table name here} UNION ALL SELECT theMonth + 1 AS theMonth, stopMonth FROM cteMonths WHERE theMonth < stopMonth)INSERT #Data ( theMonth, totals )SELECT theMonth, 0 AS totalsFROM cteMonthsORDER BY theMonth;-- Update staging table with total count for each monthUPDATE dSET d.totals = w.totalsFROM #Data AS dINNER JOIN ( -- This aggregated query should have the same filters as your query which populate @Tbl_Results table SELECT DATEDIFF(MONTH, '19000101', DateCreated) AS theMonth, COUNT(*) AS totals FROM {Your schema name here}.{Your source table name here} GROUP BY DATEDIFF(MONTH, '19000101', DateCreated) ) AS w ON w.theMonth = d.theMonth;-- Display the final result including a running totalSELECT CONVERT(CHAR(7), DATEADD(MONTH, d.theMonth, '19000101'), 120) AS colName, d.totals, f.runningTotalsFROM #Data AS dCROSS APPLY ( SELECT SUM(x.totals) FROM #Data AS x WHERE x.theMonth <= d.theMonth ) AS f(runningTotals)ORDER BY d.theMonth;-- Clean upDROP TABLE #Data;[/code] N 56°04'39.26"E 12°55'05.63" |
|
|
R
Constraint Violating Yak Guru
328 Posts |
Posted - 2012-07-25 : 10:02:51
|
Not sure why, but on running this, the average Total Execution Time in Client Statistics is showing in seconds, rather than MS. Anyway the execution time is about the same as before.http://www.4shared.com/get/-vEhdGHO/003.htmlI guess that's about as efficient as its going to get?!? |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2012-07-25 : 10:09:00
|
I am running the same query above on a 100 000 000 row table (100 million rows).The complete code takes 6 seconds.-- Prevent unwanted resultsets back to clientSET NOCOUNT ON;/*SELECT COUNT(*)FROM Sales.factSalesDetail-- 100 760 501 row(s) affected*/-- Prepare staging tableCREATE TABLE #Data ( theMonth INT PRIMARY KEY CLUSTERED, totals INT NOT NULL );-- Populate staging table with all monthsWITH cteMonths(theMonth, stopMonth)AS ( SELECT DATEDIFF(MONTH, '19000101', MIN(ssdLastKnownDate)) AS theMonth, DATEDIFF(MONTH, '19000101', MAX(ssdLastKnownDate)) AS stopMonth FROM Sales.factSalesDetail UNION ALL SELECT theMonth + 1 AS theMonth, stopMonth FROM cteMonths WHERE theMonth < stopMonth)INSERT #Data ( theMonth, totals )SELECT theMonth, 0 AS totalsFROM cteMonthsORDER BY theMonthOPTION (MAXRECURSION 0);-- Update staging table with total count for each monthUPDATE dSET d.totals = w.totalsFROM #Data AS dINNER JOIN ( -- This aggregated query should have the same filters as your query which populate @Tbl_Results table SELECT DATEDIFF(MONTH, '19000101', ssdLastKnownDate) AS theMonth, COUNT(*) AS totals FROM Sales.factSalesDetail GROUP BY DATEDIFF(MONTH, '19000101', ssdLastKnownDate) ) AS w ON w.theMonth = d.theMonth;-- Display the final result including a running totalSELECT CONVERT(CHAR(7), DATEADD(MONTH, d.theMonth, '19000101'), 120) AS colName, d.totals, f.runningTotalsFROM #Data AS dCROSS APPLY ( SELECT SUM(x.totals) FROM #Data AS x WHERE x.theMonth <= d.theMonth ) AS f(runningTotals)ORDER BY d.theMonth;-- Clean upDROP TABLE #Data; N 56°04'39.26"E 12°55'05.63" |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2012-07-25 : 10:10:57
|
Here is the statistics for the above queryClient Execution Time 16:10:05 16:09:57 16:09:35 Query Profile Statistics Number of INSERT, DELETE and UPDATE statements 2 2 2 2.0000 Rows affected by INSERT, DELETE, or UPDATE statements 0 0 0 0.0000 Number of SELECT statements 1 1 1 1.0000 Rows returned by SELECT statements 0 0 0 0.0000 Number of transactions 2 2 2 2.0000Network Statistics Number of server roundtrips 1 1 1 1.0000 TDS packets sent from client 1 1 1 1.0000 TDS packets received from server 7 7 7 7.0000 Bytes sent from client 3266 3266 3266 3266.0000 Bytes received from server 25895 25895 25895 25895.0000Time Statistics Client processing time 109 109 110 109.3333 Total execution time 6535 6457 6442 6478.0000 Wait time on server replies 6426 6348 6332 6368.6670 N 56°04'39.26"E 12°55'05.63" |
|
|
Next Page
|
|
|
|
|