Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 Count Grouping By Date Range

Author  Topic 

GregDDDD
Posting Yak Master

120 Posts

Posted - 2012-06-27 : 15:08:18
I have two tables. One tracks people as the change categories. You are given the date they changed and the person_id. So in the example table, person_id was in category_1 from 7/1/2011 to 7/12/2011 and then on 7/13/2011 they changed to category_2. Then on 7/26/2011 they changed back to category_1 and remained there for the rest of the month. In the second table I have the same person_id with a list of activities in the month. I need to count the activities in the month for the person, grouping by category. The category is not stored with the activities. I’m not concerned with the type of activity.

In the example, person_id 10002 would have 6 activities while they were in category_1 and 3 activities while they were in category_2. I’m unsure how to deal with the dates like this. If I had the start and end date on a row for the category I could do it. And no, this is not homework.


Thanks,

Greg


declare @tbl_dates TABLE (
[person_id] [int] NOT NULL,
[category_name] varchar(30) NOT NULL,
[start_date] [datetime] NULL
)

declare @tbl_activities TABLE (
[activity_id] [int] NOT NULL,
[person_id] [int] NOT NULL,
[activity_date] [datetime] NULL
)


insert into @tbl_dates
values
(1002, 'category_1', '7/1/2011')

insert into @tbl_dates
values
(1002, 'category_2', '7/13/2011')

insert into @tbl_dates
values
(1002, 'category_1', '7/26/2011')

insert into @tbl_activities
values
(1, 1002, '7/5/2011')

insert into @tbl_activities
values
(23, 1002, '7/6/2011')

insert into @tbl_activities
values
(23, 1002, '7/6/2011')

insert into @tbl_activities
values
(2, 1002, '7/6/2011')
insert into @tbl_activities
values
(2, 1002, '7/13/2011')
insert into @tbl_activities
values
(8, 1002, '7/13/2011')

insert into @tbl_activities
values
(2, 1002, '7/21/2011')

insert into @tbl_activities
values
(5, 1002, '7/28/2011')

insert into @tbl_activities
values
(8, 1002, '7/29/2011')

select * from @tbl_dates
select * from @tbl_activities

Lamprey
Master Smack Fu Yak Hacker

4614 Posts

Posted - 2012-06-27 : 15:52:50
I think this'll work:
SELECT
activites.person_id,
dates.category_name,
COUNT(*) AS row_count
FROM
@tbl_activities AS activites
INNER JOIN
(
SELECT
A.person_id,
A.category_name,
a.start_date,
COALESCE(end_date, DATEADD(MONTH, DATEDIFF(MONTH, 0, A.start_date) + 1, 0) - 1) end_date
FROM
@tbl_dates AS A
OUTER APPLY
(
SELECT MIN(start_date) AS end_date
FROM @tbl_dates
WHERE A.start_date < start_date
) AS B
) AS dates
ON activites.activity_date >= dates.start_date
and activites.activity_date < dates.end_date
GROUP BY
activites.person_id,
dates.category_name
Go to Top of Page

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2012-06-27 : 16:30:16
And a different approach. I think it should have fewer reads:

-------------------------
--CTEs
;with
d1 as (
select person_id
, category_name
, [start_date]
, row_number() over (partition by person_id order by [start_date]) rn
from @tbl_dates
)
,d2 as (
select coalesce(a.person_id, b.person_id) person_id
,coalesce(a.[category_name], b.[category_name]) category_name
,coalesce(a.[start_date], convert(datetime, 0)) [start_date]
,coalesce(b.[start_date], convert(datetime, 99999)) [end_date]
from d1 a
full outer join d1 b
on b.person_id = a.person_id
and b.rn = a.rn + 1
)
-------------------------
--select out results
select d.person_id
,d.category_name
,count(a.activity_id) activity_count
from d2 d
join @tbl_activities a
on a.person_id = d.person_id
and a.activity_date >= d.[start_date]
and a.activity_date < d.[end_date]
group by d.person_id
,d.category_name

OUTPUT:
person_id category_name activity_count
----------- ------------------------------ --------------
1002 category_1 6
1002 category_2 3


Be One with the Optimizer
TG
Go to Top of Page

GregDDDD
Posting Yak Master

120 Posts

Posted - 2012-06-27 : 16:31:44
Man, I wish I knew this stuff better. I get the feeling I write a lot more code than I need to. I think you are very close, but it is difficult for me to debug. I'm not familiar with 'OUTER APPLY'. When I added another person_id and more activities for that person_id, the numbers didn't work. below is a new script with the additional values.

Greg

insert into @tbl_activities
values
(2, 1002, '7/13/2011')
insert into @tbl_activities
values
(8, 1002, '7/13/2011')
insert into @tbl_activities
values
(2, 1002, '7/21/2011')
Go to Top of Page

GregDDDD
Posting Yak Master

120 Posts

Posted - 2012-06-27 : 16:39:46
Sorry, I must not have copied the entire script. Regardless TG's SQL seems to have worked. I added more values and it broke them out correctly. Very impressive. I will need to learn about 'coalesce'.

Thank you,

Greg
Go to Top of Page

Lamprey
Master Smack Fu Yak Hacker

4614 Posts

Posted - 2012-06-27 : 16:45:47
What do expect to see as output? Those new dats all fall between 2011-07-13 and 2011-07-26 so they'd get associated with category_2. Unless you want to chage the date prediate from >= and < to > and <= ??
Go to Top of Page

GregDDDD
Posting Yak Master

120 Posts

Posted - 2012-06-27 : 16:57:28
Sorry, those new dates were an incomplete script. Below is what I meant to post the second time.

Greg

declare @tbl_dates TABLE (
[person_id] [int] NOT NULL,
[category_name] varchar(30) NOT NULL,
[start_date] [datetime] NULL
)

declare @tbl_activities TABLE (
[activity_id] [int] NOT NULL,
[person_id] [int] NOT NULL,
[activity_date] [datetime] NULL
)


insert into @tbl_dates
values
(1002, 'category_1', '7/1/2011')

insert into @tbl_dates
values
(1002, 'category_2', '7/13/2011')

insert into @tbl_dates
values
(1002, 'category_1', '7/26/2011')

insert into @tbl_dates
values
(1003, 'category_1', '7/1/2011')


insert into @tbl_activities
values
(1, 1002, '7/5/2011')

insert into @tbl_activities
values
(23, 1002, '7/6/2011')

insert into @tbl_activities
values
(23, 1002, '7/6/2011')

insert into @tbl_activities
values
(2, 1002, '7/6/2011')
insert into @tbl_activities
values
(2, 1002, '7/13/2011')
insert into @tbl_activities
values
(8, 1002, '7/13/2011')
insert into @tbl_activities
values
(2, 1002, '7/21/2011')

insert into @tbl_activities
values
(5, 1002, '7/28/2011')

insert into @tbl_activities
values
(8, 1002, '7/29/2011')
insert into @tbl_activities
values
(8, 1002, '7/29/2011')

insert into @tbl_activities
values
(2, 1003, '7/13/2011')
insert into @tbl_activities
values
(8, 1003, '7/13/2011')
insert into @tbl_activities
values
(2, 1003, '7/21/2011')
insert into @tbl_activities
values
(2, 1003, '7/31/2011')
Go to Top of Page

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2012-06-27 : 17:18:08
Based on the lastest data sample here are the results for both solutions - including some IO stats:

TG:
person_id category_name activity_count
----------- ------------------------------ --------------
1002 category_1 7
1003 category_1 4
1002 category_2 3

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0,
Table '#77CAB889'. Scan count 2, logical reads 2, physical reads 0,
Table '#78BEDCC2'. Scan count 1, logical reads 1, physical reads 0,



Lamprey:
person_id category_name row_count
----------- ------------------------------ -----------
1002 category_1 11
1002 category_2 3
1003 category_2 3

Table '#77CAB889'. Scan count 43, logical reads 56, physical reads 0
Table '#78BEDCC2'. Scan count 1, logical reads 1, physical reads 0



Be One with the Optimizer
TG
Go to Top of Page
   

- Advertisement -