Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Development (2000)
 Best way to find anomalous data?

Author  Topic 

aiken
Aged Yak Warrior

525 Posts

Posted - 2002-07-22 : 02:10:43
I have a couple of different needs to spot out-of-the-ordinary data in tables. One example is a basic aggredated web log in table form; it looks something like this:

create table log_web
(i int IDENTITY(1,1) PRIMARY KEY CLUSTERED,
added smalldatetime NOT NULL DEFAULT (getdate()),
pagename varchar(20) NOT NULL,
times int NOT NULL)


...this table gets updated once a day, with aggregate totals from a daily page log. The idea being to track page usage over time.

Now, I know this shouldn't be too hard, and it seems like basic statistics. But for some reason it escapes me.

How can I, say, find pages whose total views ("times" in my schema) have increased or decreased by 20% or more in this calendar month as compared to last calendar month? Or, more simply, today over yesterday?

It seems overkill to bring OLAP / analysis services into this. There's got to be a way to do this with a simple query, right?

Thanks
-b




MuffinMan
Posting Yak Master

107 Posts

Posted - 2002-07-22 : 13:03:15
This will do a daily comparison. You can easily modify it to do weekly, monthly, quarterly, etc.


set nocount on

create table #log_web
(i int IDENTITY(1,1) PRIMARY KEY CLUSTERED,
added smalldatetime NOT NULL DEFAULT (getdate()),
pagename varchar(20) NOT NULL,
times int NOT NULL)


insert #log_web (added , pagename, times) values ('2002-07-20' , 'www.sqlteam.com', 125)
insert #log_web (added , pagename, times) values ('2002-07-20' , 'www.yahoo.com', 40)
insert #log_web (added , pagename, times) values ('2002-07-20' , 'www.cnn.com', 55)


insert #log_web (added , pagename, times) values ('2002-07-21' , 'www.sqlteam.com', 195)
insert #log_web (added , pagename, times) values ('2002-07-21' , 'www.yahoo.com', 31)
insert #log_web (added , pagename, times) values ('2002-07-21' , 'www.cnn.com', 60)


-- show all records and their percent change
print '*** All records ***'
select b.pagename , a.times as Prev_Times, b.times
, Pct_Chg = Convert(numeric(8,2), (b.times / Convert(numeric(8,2), a.times)))
from #log_web b
inner join #log_web a
on b.added = DateAdd(d, 1, a.added) and b.pagename = a.pagename


-- show only those records that exceed defined threshold
print ''
print '*** Outlier records ***'
select b.pagename , a.times as Prev_Times, b.times
, Pct_Chg = Convert(numeric(8,2), (b.times / Convert(numeric(8,2), a.times)))
from #log_web b
inner join #log_web a
on b.added = DateAdd(d, 1, a.added) and b.pagename = a.pagename
where not ( (b.times / Convert(numeric(8,2), a.times)) between .80 and 1.20)

-- drop table #log_web

--here is the output:

*** All records ***
pagename Prev_Times times Pct_Chg
-------------------- ----------- ----------- ----------
www.sqlteam.com 125 195 1.56
www.yahoo.com 40 31 .78
www.cnn.com 55 60 1.09


*** Outlier records ***
pagename Prev_Times times Pct_Chg
-------------------- ----------- ----------- ----------
www.sqlteam.com 125 195 1.56
www.yahoo.com 40 31 .78


Go to Top of Page

aiken
Aged Yak Warrior

525 Posts

Posted - 2002-07-22 : 18:58:34
Thanks very much -- that's just what I needed; I just couldn't get my head around it for some reason.

Cheers
-b

Go to Top of Page
   

- Advertisement -