Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Development (2000)
 Help with NOT IN query

Author  Topic 

benricho
Yak Posting Veteran

84 Posts

Posted - 2003-10-02 : 04:12:37
Hi, I am building a little system to help a client send out some newsletters to his subscribers. Everything was working well in testing, but when I imported his subscriber list I ran into problems, the query is timing out.

Basically I am inserting into a table that holds the ID of the email that he is sending, and the ID of the subscriber he wants to send it to. He has a couple of lists of subscribers, and a subscriber can be on more than one list but we want to make sure we only send it to them once, so we want to make sure that we don't insert subscribers who have an email address the same as the ones we have already added.

My query looks like this:

INSERT INTO SendEmailTo
SELECT
@emailID,
s.[ID]
FROM
Subscribers s
WHERE
s.listID = @listID
AND s.active = 1
AND s.emailAddress NOT IN (SELECT
s2.emailAddress
FROM
Subscribers s2
INNER JOIN SendEmailTo sset ON s2.[ID] = sset.subscriberID
WHERE
sset.emailID = @emailID)


So the problem is that the SendEmailTo table keeps growing and slowing the query down. In fact it always inserts 3785 records before the query times out.

Any ideas on how I can change my query/database structure so that this will work with lots of records?

By the way I don't think I can have the email address as the unique ID of the subscribers table because they need to be able to unsubscribe from some lists and not others, and I use the subscriber ID to track when a user opens the email.

Page47
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2003-10-02 : 07:02:37
SELECT distinct?

Jay White
{0}
Go to Top of Page

benricho
Yak Posting Veteran

84 Posts

Posted - 2003-10-02 : 07:46:33
SELECT distinct won't work because there can be more than one subscriber ID for the one email address.
Go to Top of Page

Amethystium
Aged Yak Warrior

701 Posts

Posted - 2003-10-02 : 07:57:30
Use NOT EXISTS instead.

__________________
Make love not war!
Go to Top of Page

benricho
Yak Posting Veteran

84 Posts

Posted - 2003-10-02 : 08:06:50
Thanks, I'll give that a go when I get back to work and let you know how it goes.
Go to Top of Page

Page47
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2003-10-02 : 08:51:03
quote:
Originally posted by benricho

SELECT distinct won't work because there can be more than one subscriber ID for the one email address.


So if you have blah@domain.com with an id of 1 and 2, do you want to insert 2 rows into SendEmailTo or just 1? If just 1, which one? 1 or 2? If both, then they are distinct, right?

Jay White
{0}
Go to Top of Page

Amethystium
Aged Yak Warrior

701 Posts

Posted - 2003-10-02 : 08:57:48
I suggested using the EXISTS operator (well in your case you have to apply reverse logic) because SQL Server does not have to perform a row by row join. Instead it can look through the records until it finds the first match and stop right there. As soon as there is a match the value EXISTS returns is TRUE so there is no need to go further.

Many queries can be written using the IN keyword or the EXISTS operator but EXISTS is useful if performance is vital and in your case it sounds like it is.

Good luck.

__________________
Make love not war!
Go to Top of Page

Page47
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2003-10-02 : 09:40:39
quote:
Originally posted by Amethystium
Many queries can be written using the IN keyword or the EXISTS operator but EXISTS is useful if performance is vital and in your case it sounds like it is.



Please don't forget this debate.

Jay White
{0}
Go to Top of Page

Amethystium
Aged Yak Warrior

701 Posts

Posted - 2003-10-02 : 11:18:57
No I haven't forgotten Page47!

But I think this question perfectly illustrates what I was trying to get at last time!


__________________
Make love not war!
Go to Top of Page

benricho
Yak Posting Veteran

84 Posts

Posted - 2003-10-02 : 19:31:32
OK, I have changed my query to look like this:

INSERT INTO SendEmailTo
SELECT
@emailID,
s.[ID]
FROM
Subscribers s
WHERE
s.listID = @listID
AND s.active = 1
AND NOT EXISTS (SELECT
s2.emailAddress
FROM
Subscribers s2
INNER JOIN SendEmailTo sset ON s2.[ID] = sset.subscriberID
WHERE
sset.emailID = @emailID
AND s2.emailAddress = s.emailAddress)


...but it is still timing out, and again it manages to enter 3785 records before it times out.

And Page47, I only want 1 of the rows to be inserted, as and for which one, it's a first in deal.
Go to Top of Page

benricho
Yak Posting Veteran

84 Posts

Posted - 2003-10-02 : 22:11:17
OK, I made a couple of changes that made the query run a lot faster. I changed the SendEmailTo table to include an email address then modified my query to the following:

INSERT INTO SendEmailTo
SELECT
@emailID,
s.[ID],
s.emailAddress
FROM
Subscribers s
WHERE
s.listID = @listID
AND s.active = 1
AND NOT EXISTS (SELECT
set.emailAddress
FROM
SendEmailTo sset
WHERE
set.emailAddress = s.emailAddress)


This seems to run a lot faster, but if anyone can pick any other ways to optimize the query I would be very interested.
Go to Top of Page

byrmol
Shed Building SQL Farmer

1591 Posts

Posted - 2003-10-02 : 23:17:17
Is EmailAddress Indexed in both tables?

DavidM

"SQL-3 is an abomination.."
Go to Top of Page

benricho
Yak Posting Veteran

84 Posts

Posted - 2003-10-02 : 23:43:15
yep
Go to Top of Page
   

- Advertisement -