QUERY re-design help - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

SQL Server Development (2000)

QUERY re-design help

Author

Topic

jaroot
Starting Member

46 Posts

Posted - 2005-11-03 : 18:10:30

I need to figure out how to narrow down a large data set into a smaller data set based on a number of business rules.
It's a mail tracking system with millions of records(seeds a.k.a- mail pieces)
Heres a sample of the data:


SET NOCOUNT ON
CREATE TABLE SEEDS1(VERSIONID int, SEEDID int, SEEDSTATUSID int, SEEDDATE smalldatetime)
CREATE TABLE SEEDS2(VERSIONID int, SEEDID int, SEEDSTATUSID int, SEEDDATE smalldatetime)
GO

INSERT INTO SEEDS1(VERSIONID, SEEDID, SEEDSTATUSID, SEEDDATE)
SELECT 1,             11,     1,	'1/1/2002' UNION ALL
SELECT 1,             11,     2,	'1/2/2002' UNION ALL
SELECT 2,             21,     1,	'1/1/2003' UNION ALL
SELECT 2,             21,     2,	'1/3/2003' UNION ALL
SELECT 2,             31,     2,	'1/1/2003' UNION ALL
SELECT 2,             41,     3,	'1/1/2003' UNION ALL
SELECT 3,             51,     2,	'2/1/2002' UNION ALL
SELECT 3,             51,     2,	'2/1/2002' UNION ALL
SELECT 3,             51,     4,	'2/6/2002' UNION ALL
SELECT 3,             61,     5,	'1/1/2002' UNION ALL
SELECT 4,             71,     5,	'2/2/2003'
GO

/*I need to figure out the most efficent way to get the following results:*/
INSERT INTO SEEDS2(VERSIONID, SEEDID, SEEDSTATUSID, SEEDDATE)
SELECT 1,             11,     2,	'1/2/2002' UNION ALL
SELECT 2,             21,     2,	'1/3/2003' UNION ALL
SELECT 3,             51,     4,	'2/6/2002' UNION ALL
SELECT 4,             71,     5,	'2/2/2003'
/*Basically the max date from each subset of VERSIONS/SEEDID*/
SELECT * FROM SEEDS1
SELECT * FROM SEEDS2

SET NOCOUNT OFF
DROP TABLE SEEDS1
DROP TABLE SEEDS2
GO

Here is the query I'm using. I'm actually using a cursor in front of this to loop over a list of versions (approx 4000), and each version contains 1-10000 seeds within it.


INSERT INTO SEEDS2(VERSIONID, SEEDID, SEEDSTATUSID, SEEDDATE)
SELECT SE.VERSIONID, SE.SEEDID, SE.SEEDSTATUSID, MAX(SE.SEEDDATE) AS SEEDDATE
FROM (
		SELECT DISTINCT @VERSIONID AS VERSIONID, S1.SEEDID, 
		CASE
			WHEN S1.SEEDSTATUSID = 2 THEN 4
			WHEN S1.SEEDSTATUSID = 4 THEN 5
			WHEN S1.SEEDSTATUSID IN (1,3) THEN 3
			WHEN S1.SEEDSTATUSID = 5 THEN 1
		END AS SEEDSTATUSID, S1.SEEDDATE
		
		FROM SEEDS1 S1
		WHERE S1.VERSIONID = @VERSIONID AND 
		((S1.SEEDSTATUSID = 2) 
					OR (S1.SEEDSTATUSID = 4 AND S1.SEEDID NOT IN( SELECT SE1.SEEDID
							FROM SEEDS1 SE1
							WHERE SE1.VERSIONID = @VERSIONID AND SE1.SEEDSTATUSID = 2)
					OR (S1.SEEDSTATUSID IN (1,3) AND S1.SEEDID NOT IN( SELECT SE2.SEEDID
							FROM SEEDS1 SE2
							WHERE SE2.VERSIONID = @VERSIONID AND SE2.SEEDSTATUSID IN (2,4))
					OR (S1.SEEDSTATUSID = 5 AND S1.SEEDID NOT IN(SELECT SE3.SEEDID
							FROM SEEDS1 SE3
							WHERE SE3.VERSIONID = @VERSIONID AND SE3.SEEDSTATUSID IN (1,2,3,4)))))
		)

) SE
GROUP BY  SE.SEEDID, SE.VERSIONID, SE.SEEDSTATUSID

I'm pretty sure the SUB queries are the reason my process is so slow, but I'm not quite sure how to re-write the SQL to do the same thing, just quicker.

Any help would be GREATLY appreciated!

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2005-11-03 : 19:57:49

How about what results you hope to get from your sample data, and a brief description of how to derive those results. Trying to figure out your SQL code is pretty difficult. The most important part of solving any problem is stating it very clearly and concisely in regular old English.

jaroot
Starting Member

46 Posts

Posted - 2005-11-04 : 10:41:59

Ya sorry.. I'm horrible at trying to explain problems.

I'm trying to basically narrow down a large result set, into a smaller one.
I'm dealing with ~1.5million records, which I need to narrow down to
about 500,000.

To do that, I need to grab the MAX date from each subset of data because
the records that exist within the current larger result set store
multiple duplicate records (with different dates & statuses).

So with the two sample data sets I listed above, that shows an example
of how I need to narrow down the data. I'm grabbing all of the records
based on the statuses, then I'm narrowing it down by treating it
as a derived table, and taking the MAX date from those records.

My problem is that with so many records in the SEEDS1 table
my script runs exremely slow. For just 1 VERSIONID that has only 16
distinct SEEDID's, it takes approx 7 seconds to run. I have approx 4,000
versions to deal with, with anywhere from 1 to 10,000 seeds.

X002548
Not Just a Number

15586 Posts

Posted - 2005-11-04 : 10:59:26

[code]
CREATE INDEX myIndex99 ON SEEDS1(VERSIONID, SEEDDATE)

SELECT *
FROM SEEDS1 o
WHERE EXISTS (SELECT *
FROM SEEDS1 i
WHERE o.VERSIONID = i.VERSIONID
GROUP BY i.VERSIONID
HAVING MAX(i.SEEDDATE) = o.SEEDDATE)
[/code]

Brett

8-)

Hint: Want your questions answered fast? Follow the direction in this link
http://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspx

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources