I need to figure out how to narrow down a large data set into a smaller data set based on a number of business rules.It's a mail tracking system with millions of records(seeds a.k.a- mail pieces) Heres a sample of the data:SET NOCOUNT ONCREATE TABLE SEEDS1(VERSIONID int, SEEDID int, SEEDSTATUSID int, SEEDDATE smalldatetime)CREATE TABLE SEEDS2(VERSIONID int, SEEDID int, SEEDSTATUSID int, SEEDDATE smalldatetime)GOINSERT INTO SEEDS1(VERSIONID, SEEDID, SEEDSTATUSID, SEEDDATE)SELECT 1, 11, 1, '1/1/2002' UNION ALLSELECT 1, 11, 2, '1/2/2002' UNION ALLSELECT 2, 21, 1, '1/1/2003' UNION ALLSELECT 2, 21, 2, '1/3/2003' UNION ALLSELECT 2, 31, 2, '1/1/2003' UNION ALLSELECT 2, 41, 3, '1/1/2003' UNION ALLSELECT 3, 51, 2, '2/1/2002' UNION ALLSELECT 3, 51, 2, '2/1/2002' UNION ALLSELECT 3, 51, 4, '2/6/2002' UNION ALLSELECT 3, 61, 5, '1/1/2002' UNION ALLSELECT 4, 71, 5, '2/2/2003'GO/*I need to figure out the most efficent way to get the following results:*/INSERT INTO SEEDS2(VERSIONID, SEEDID, SEEDSTATUSID, SEEDDATE)SELECT 1, 11, 2, '1/2/2002' UNION ALLSELECT 2, 21, 2, '1/3/2003' UNION ALLSELECT 3, 51, 4, '2/6/2002' UNION ALLSELECT 4, 71, 5, '2/2/2003'/*Basically the max date from each subset of VERSIONS/SEEDID*/SELECT * FROM SEEDS1SELECT * FROM SEEDS2SET NOCOUNT OFFDROP TABLE SEEDS1DROP TABLE SEEDS2GO
Here is the query I'm using. I'm actually using a cursor in front of this to loop over a list of versions (approx 4000), and each version contains 1-10000 seeds within it.INSERT INTO SEEDS2(VERSIONID, SEEDID, SEEDSTATUSID, SEEDDATE)SELECT SE.VERSIONID, SE.SEEDID, SE.SEEDSTATUSID, MAX(SE.SEEDDATE) AS SEEDDATEFROM ( SELECT DISTINCT @VERSIONID AS VERSIONID, S1.SEEDID, CASE WHEN S1.SEEDSTATUSID = 2 THEN 4 WHEN S1.SEEDSTATUSID = 4 THEN 5 WHEN S1.SEEDSTATUSID IN (1,3) THEN 3 WHEN S1.SEEDSTATUSID = 5 THEN 1 END AS SEEDSTATUSID, S1.SEEDDATE FROM SEEDS1 S1 WHERE S1.VERSIONID = @VERSIONID AND ((S1.SEEDSTATUSID = 2) OR (S1.SEEDSTATUSID = 4 AND S1.SEEDID NOT IN( SELECT SE1.SEEDID FROM SEEDS1 SE1 WHERE SE1.VERSIONID = @VERSIONID AND SE1.SEEDSTATUSID = 2) OR (S1.SEEDSTATUSID IN (1,3) AND S1.SEEDID NOT IN( SELECT SE2.SEEDID FROM SEEDS1 SE2 WHERE SE2.VERSIONID = @VERSIONID AND SE2.SEEDSTATUSID IN (2,4)) OR (S1.SEEDSTATUSID = 5 AND S1.SEEDID NOT IN(SELECT SE3.SEEDID FROM SEEDS1 SE3 WHERE SE3.VERSIONID = @VERSIONID AND SE3.SEEDSTATUSID IN (1,2,3,4))))) )) SEGROUP BY SE.SEEDID, SE.VERSIONID, SE.SEEDSTATUSID
I'm pretty sure the SUB queries are the reason my process is so slow, but I'm not quite sure how to re-write the SQL to do the same thing, just quicker.Any help would be GREATLY appreciated!