Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Development (2000)
 NET CHANGE

Author  Topic 

ValterBorges
Master Smack Fu Yak Hacker

1429 Posts

Posted - 2004-03-20 : 11:35:52
I'm trying to figure out a way to perform net change on large (50Million+ Row) tables.

Especially trying to figure out if a row as changed.

Currently i'm joining on the keys and checking every column in the where clause with a<>a or b<>b. This can be quite slow.
Even worse is when the key is a combination of 10 columns and the file has 150 columns.

I've looked into using checksum or binary_checksum but it doesn't guarantee uniqueness. Using three checksums one for the key, one for the entire row and one for the sum of each column in the row was unique but if checksum or binary_checksum is not unique it's only a matter of the right combination to make the three checksum approach incorrect.


--Two different items can have same checksum
select checksum(7596,cast('20030112' as datetime),51767),
checksum(7596,cast('20030119' as datetime),54695)

Does anyone have a fast and accurate way of performing net change?




ehorn
Master Smack Fu Yak Hacker

1632 Posts

Posted - 2004-03-20 : 13:50:26
Not sure if this can help:
SELECT t1.<keys>
FROM <table1> t1
JOIN <table2> t2 on t2.<keys> = t1.<keys>
AND
(
t2.<column1> != t1.<column1>
OR
t2.<column2> != t1.<column2>
OR
t2.<columnN> != t1.<columnN>
)
Go to Top of Page

ValterBorges
Master Smack Fu Yak Hacker

1429 Posts

Posted - 2004-03-20 : 14:19:59
Thanks ehorn but have you read my post

"Currently i'm joining on the keys and checking every column in the where clause with a<>a or b<>b. This can be quite slow."

By moving the check to join clause will still be slow when you have 150 columns.
Go to Top of Page

ehorn
Master Smack Fu Yak Hacker

1632 Posts

Posted - 2004-03-20 : 14:44:10
Yes Valter I read your post and I am sorry my suggestion was of no use to you. Though I am hopeful one of the SQL Gurus here can provide you more valuable input.
Go to Top of Page

ValterBorges
Master Smack Fu Yak Hacker

1429 Posts

Posted - 2004-03-20 : 15:44:27
Thanks again ehorn
Go to Top of Page

achobbs
Starting Member

11 Posts

Posted - 2004-03-20 : 17:17:44
Just thinking about it quickly I think the following would work. Not sure of the performance of it though but you could try.

Join all columns in a full outer, ie.

select
distinct t1.*
from
table1 t1
full outer join
table2 t2
on t1.col1 = t2.col1
and t1.col2 = t2.col2
...
and t1.colN = t2.colN
where
t1.KeyCol is null
or t2.KeyCol is null

Joining on all columns should cause the join condition to fail where columns are not the same and then placing t1.KeyCol and t2.KeyCol nullable filters in the where clause will force only results that did not join to be returned thereby giving you differing rows.

I just ran it on a 100K row table and it seemed to run fast, but not sure about 50m row table.

Just a thought.

Go to Top of Page

Stoad
Freaky Yak Linguist

1983 Posts

Posted - 2004-03-20 : 17:36:29
Try to use a cursor for the purpose. In case of such huge
data volumes cursor operations may be cheaper than join.
Go to Top of Page

derrickleggett
Pointy Haired Yak DBA

4184 Posts

Posted - 2004-03-21 : 21:55:29
I doubt the cursor approach would work more efficiently. SQL processes sets much more rapidly than it performs repeated looping operations. This is especially true in a cursor. If you decide to test the cursor, at least use a while loop.

The two fastest methods should be checking each column by joining in the from clause and testing in the where clause, or using an exists WHERE NOT EXISTS(SELECT * FROM compare_table).

This will basically perform the same operation as the large join though in the execution plan but requires less code.

MeanOldDBA
derrickleggett@hotmail.com

When life gives you a lemon, fire the DBA.
Go to Top of Page

Stoad
Freaky Yak Linguist

1983 Posts

Posted - 2004-03-22 : 09:33:39
Derrick,

btw I doubt too. My bet is approx. 20/80 in favor of cursor (while loop).
Go to Top of Page

X002548
Not Just a Number

15586 Posts

Posted - 2004-03-22 : 14:04:57
Valter,

What about the UNION trick?

Having COUNT(*) = 1

An make that part of the predicate for the update....



Brett

8-)
Go to Top of Page

ValterBorges
Master Smack Fu Yak Hacker

1429 Posts

Posted - 2004-04-04 : 16:21:43
Sorry,

I lost you there please elaborate.
Go to Top of Page

derrickleggett
Pointy Haired Yak DBA

4184 Posts

Posted - 2004-04-04 : 16:50:32
When you do a UNION ALL, it automatically assumes the rows in both sets are unique, or you want both recordsets completely. The UNION on the other hand looks at the rows and makes sure it only adds rows from the second set that are unique. If you add a COUNT(*)>1 on there, you can find out if it added any rows.

Have you tried the NOT EXISTS approach though? I've found it to be faster at times. Also, the JOIN comparison does normally perform better than the WHERE comparison, although it's not a great performer either way.

MeanOldDBA
derrickleggett@hotmail.com

When life gives you a lemon, fire the DBA.
Go to Top of Page

byrmol
Shed Building SQL Farmer

1591 Posts

Posted - 2004-04-04 : 23:45:49
We do a fair bit of delta processing and we usually always go for a datetime column tacked on the end of the table that gets updated via a trigger for updates. A second table holds the datetime value that the delta processing was done and this value is used to create a filter for the next delta. Rinse and repeat.... When the datetime column is indexed, the subsequent JOIN and NOT EXISTS expression (for update and insert) perform very well...

DavidM

"If you are not my family or friend, then I will discriminate against you"
Go to Top of Page
   

- Advertisement -