Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Development (2000)
 QUERY re-design help

Author  Topic 

jaroot
Starting Member

46 Posts

Posted - 2005-11-03 : 18:10:30
I need to figure out how to narrow down a large data set into a smaller data set based on a number of business rules.
It's a mail tracking system with millions of records(seeds a.k.a- mail pieces)
Heres a sample of the data:

SET NOCOUNT ON
CREATE TABLE SEEDS1(VERSIONID int, SEEDID int, SEEDSTATUSID int, SEEDDATE smalldatetime)
CREATE TABLE SEEDS2(VERSIONID int, SEEDID int, SEEDSTATUSID int, SEEDDATE smalldatetime)
GO

INSERT INTO SEEDS1(VERSIONID, SEEDID, SEEDSTATUSID, SEEDDATE)
SELECT 1, 11, 1, '1/1/2002' UNION ALL
SELECT 1, 11, 2, '1/2/2002' UNION ALL
SELECT 2, 21, 1, '1/1/2003' UNION ALL
SELECT 2, 21, 2, '1/3/2003' UNION ALL
SELECT 2, 31, 2, '1/1/2003' UNION ALL
SELECT 2, 41, 3, '1/1/2003' UNION ALL
SELECT 3, 51, 2, '2/1/2002' UNION ALL
SELECT 3, 51, 2, '2/1/2002' UNION ALL
SELECT 3, 51, 4, '2/6/2002' UNION ALL
SELECT 3, 61, 5, '1/1/2002' UNION ALL
SELECT 4, 71, 5, '2/2/2003'
GO

/*I need to figure out the most efficent way to get the following results:*/
INSERT INTO SEEDS2(VERSIONID, SEEDID, SEEDSTATUSID, SEEDDATE)
SELECT 1, 11, 2, '1/2/2002' UNION ALL
SELECT 2, 21, 2, '1/3/2003' UNION ALL
SELECT 3, 51, 4, '2/6/2002' UNION ALL
SELECT 4, 71, 5, '2/2/2003'
/*Basically the max date from each subset of VERSIONS/SEEDID*/
SELECT * FROM SEEDS1
SELECT * FROM SEEDS2

SET NOCOUNT OFF
DROP TABLE SEEDS1
DROP TABLE SEEDS2
GO


Here is the query I'm using. I'm actually using a cursor in front of this to loop over a list of versions (approx 4000), and each version contains 1-10000 seeds within it.

INSERT INTO SEEDS2(VERSIONID, SEEDID, SEEDSTATUSID, SEEDDATE)
SELECT SE.VERSIONID, SE.SEEDID, SE.SEEDSTATUSID, MAX(SE.SEEDDATE) AS SEEDDATE
FROM (
SELECT DISTINCT @VERSIONID AS VERSIONID, S1.SEEDID,
CASE
WHEN S1.SEEDSTATUSID = 2 THEN 4
WHEN S1.SEEDSTATUSID = 4 THEN 5
WHEN S1.SEEDSTATUSID IN (1,3) THEN 3
WHEN S1.SEEDSTATUSID = 5 THEN 1
END AS SEEDSTATUSID, S1.SEEDDATE

FROM SEEDS1 S1
WHERE S1.VERSIONID = @VERSIONID AND
((S1.SEEDSTATUSID = 2)
OR (S1.SEEDSTATUSID = 4 AND S1.SEEDID NOT IN( SELECT SE1.SEEDID
FROM SEEDS1 SE1
WHERE SE1.VERSIONID = @VERSIONID AND SE1.SEEDSTATUSID = 2)
OR (S1.SEEDSTATUSID IN (1,3) AND S1.SEEDID NOT IN( SELECT SE2.SEEDID
FROM SEEDS1 SE2
WHERE SE2.VERSIONID = @VERSIONID AND SE2.SEEDSTATUSID IN (2,4))
OR (S1.SEEDSTATUSID = 5 AND S1.SEEDID NOT IN(SELECT SE3.SEEDID
FROM SEEDS1 SE3
WHERE SE3.VERSIONID = @VERSIONID AND SE3.SEEDSTATUSID IN (1,2,3,4)))))
)

) SE
GROUP BY SE.SEEDID, SE.VERSIONID, SE.SEEDSTATUSID


I'm pretty sure the SUB queries are the reason my process is so slow, but I'm not quite sure how to re-write the SQL to do the same thing, just quicker.

Any help would be GREATLY appreciated!

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2005-11-03 : 19:57:49
How about what results you hope to get from your sample data, and a brief description of how to derive those results. Trying to figure out your SQL code is pretty difficult. The most important part of solving any problem is stating it very clearly and concisely in regular old English.
Go to Top of Page

jaroot
Starting Member

46 Posts

Posted - 2005-11-04 : 10:41:59
Ya sorry.. I'm horrible at trying to explain problems.

I'm trying to basically narrow down a large result set, into a smaller one.
I'm dealing with ~1.5million records, which I need to narrow down to
about 500,000.

To do that, I need to grab the MAX date from each subset of data because
the records that exist within the current larger result set store
multiple duplicate records (with different dates & statuses).

So with the two sample data sets I listed above, that shows an example
of how I need to narrow down the data. I'm grabbing all of the records
based on the statuses, then I'm narrowing it down by treating it
as a derived table, and taking the MAX date from those records.

My problem is that with so many records in the SEEDS1 table
my script runs exremely slow. For just 1 VERSIONID that has only 16
distinct SEEDID's, it takes approx 7 seconds to run. I have approx 4,000
versions to deal with, with anywhere from 1 to 10,000 seeds.

Go to Top of Page

X002548
Not Just a Number

15586 Posts

Posted - 2005-11-04 : 10:59:26
[code]
CREATE INDEX myIndex99 ON SEEDS1(VERSIONID, SEEDDATE)

SELECT *
FROM SEEDS1 o
WHERE EXISTS (SELECT *
FROM SEEDS1 i
WHERE o.VERSIONID = i.VERSIONID
GROUP BY i.VERSIONID
HAVING MAX(i.SEEDDATE) = o.SEEDDATE)
[/code]


Brett

8-)

Hint: Want your questions answered fast? Follow the direction in this link
http://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspx
Go to Top of Page
   

- Advertisement -