| Author |
Topic |
|
ValterBorges
Master Smack Fu Yak Hacker
1429 Posts |
Posted - 2004-03-20 : 11:35:52
|
| I'm trying to figure out a way to perform net change on large (50Million+ Row) tables.Especially trying to figure out if a row as changed.Currently i'm joining on the keys and checking every column in the where clause with a<>a or b<>b. This can be quite slow.Even worse is when the key is a combination of 10 columns and the file has 150 columns.I've looked into using checksum or binary_checksum but it doesn't guarantee uniqueness. Using three checksums one for the key, one for the entire row and one for the sum of each column in the row was unique but if checksum or binary_checksum is not unique it's only a matter of the right combination to make the three checksum approach incorrect.--Two different items can have same checksumselect checksum(7596,cast('20030112' as datetime),51767), checksum(7596,cast('20030119' as datetime),54695)Does anyone have a fast and accurate way of performing net change? |
|
|
ehorn
Master Smack Fu Yak Hacker
1632 Posts |
Posted - 2004-03-20 : 13:50:26
|
Not sure if this can help:SELECT t1.<keys>FROM <table1> t1JOIN <table2> t2 on t2.<keys> = t1.<keys> AND( t2.<column1> != t1.<column1> OR t2.<column2> != t1.<column2> OR t2.<columnN> != t1.<columnN>) |
 |
|
|
ValterBorges
Master Smack Fu Yak Hacker
1429 Posts |
Posted - 2004-03-20 : 14:19:59
|
| Thanks ehorn but have you read my post"Currently i'm joining on the keys and checking every column in the where clause with a<>a or b<>b. This can be quite slow."By moving the check to join clause will still be slow when you have 150 columns. |
 |
|
|
ehorn
Master Smack Fu Yak Hacker
1632 Posts |
Posted - 2004-03-20 : 14:44:10
|
| Yes Valter I read your post and I am sorry my suggestion was of no use to you. Though I am hopeful one of the SQL Gurus here can provide you more valuable input. |
 |
|
|
ValterBorges
Master Smack Fu Yak Hacker
1429 Posts |
Posted - 2004-03-20 : 15:44:27
|
| Thanks again ehorn |
 |
|
|
achobbs
Starting Member
11 Posts |
Posted - 2004-03-20 : 17:17:44
|
| Just thinking about it quickly I think the following would work. Not sure of the performance of it though but you could try.Join all columns in a full outer, ie.select distinct t1.*from table1 t1 full outer join table2 t2 on t1.col1 = t2.col1 and t1.col2 = t2.col2 ... and t1.colN = t2.colNwhere t1.KeyCol is null or t2.KeyCol is nullJoining on all columns should cause the join condition to fail where columns are not the same and then placing t1.KeyCol and t2.KeyCol nullable filters in the where clause will force only results that did not join to be returned thereby giving you differing rows.I just ran it on a 100K row table and it seemed to run fast, but not sure about 50m row table.Just a thought. |
 |
|
|
Stoad
Freaky Yak Linguist
1983 Posts |
Posted - 2004-03-20 : 17:36:29
|
| Try to use a cursor for the purpose. In case of such hugedata volumes cursor operations may be cheaper than join. |
 |
|
|
derrickleggett
Pointy Haired Yak DBA
4184 Posts |
Posted - 2004-03-21 : 21:55:29
|
| I doubt the cursor approach would work more efficiently. SQL processes sets much more rapidly than it performs repeated looping operations. This is especially true in a cursor. If you decide to test the cursor, at least use a while loop.The two fastest methods should be checking each column by joining in the from clause and testing in the where clause, or using an exists WHERE NOT EXISTS(SELECT * FROM compare_table).This will basically perform the same operation as the large join though in the execution plan but requires less code.MeanOldDBAderrickleggett@hotmail.comWhen life gives you a lemon, fire the DBA. |
 |
|
|
Stoad
Freaky Yak Linguist
1983 Posts |
Posted - 2004-03-22 : 09:33:39
|
| Derrick,btw I doubt too. My bet is approx. 20/80 in favor of cursor (while loop). |
 |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2004-03-22 : 14:04:57
|
| Valter,What about the UNION trick?Having COUNT(*) = 1An make that part of the predicate for the update....Brett8-) |
 |
|
|
ValterBorges
Master Smack Fu Yak Hacker
1429 Posts |
Posted - 2004-04-04 : 16:21:43
|
| Sorry, I lost you there please elaborate. |
 |
|
|
derrickleggett
Pointy Haired Yak DBA
4184 Posts |
Posted - 2004-04-04 : 16:50:32
|
| When you do a UNION ALL, it automatically assumes the rows in both sets are unique, or you want both recordsets completely. The UNION on the other hand looks at the rows and makes sure it only adds rows from the second set that are unique. If you add a COUNT(*)>1 on there, you can find out if it added any rows.Have you tried the NOT EXISTS approach though? I've found it to be faster at times. Also, the JOIN comparison does normally perform better than the WHERE comparison, although it's not a great performer either way.MeanOldDBAderrickleggett@hotmail.comWhen life gives you a lemon, fire the DBA. |
 |
|
|
byrmol
Shed Building SQL Farmer
1591 Posts |
Posted - 2004-04-04 : 23:45:49
|
| We do a fair bit of delta processing and we usually always go for a datetime column tacked on the end of the table that gets updated via a trigger for updates. A second table holds the datetime value that the delta processing was done and this value is used to create a filter for the next delta. Rinse and repeat.... When the datetime column is indexed, the subsequent JOIN and NOT EXISTS expression (for update and insert) perform very well...DavidM"If you are not my family or friend, then I will discriminate against you" |
 |
|
|
|