Author |
Topic |
timmy
Master Smack Fu Yak Hacker
1242 Posts |
Posted - 2006-08-11 : 00:17:53
|
Hi all, I'm creating a script that uses BINARY_CHECKSUM on all columns in a table's primary key, but I'm getting duplicates:SELECT key1, key2, key3, key4, key5, BINARY_CHECKSUM(key1, key2, key3, key4, key5) as CheckSumValFROM myTableORDER BY BINARY_CHECKSUM(key1, key2, key3, key4, key5) ASC The table in question has 7 columns in it's primary key and is a mixture of Varchars and Smallints. The table has 2530 records, but when I do a unique query on the results of the above, I only get 2518 unique 'CheckSumVal' values. I'm assuming that BINARY_CHECKSUM is not reliable enough, but thought I might ask....Cheers,Tim |
|
Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)
7020 Posts |
Posted - 2006-08-11 : 00:24:13
|
You said the table has 7 columns in the primary key, but you only included 5 of them in the BINARY_CHECKSUM. That could lead to duplicate checksums.There is also a small, but real chance that 2 different keys can have the same checksum (about 1 chance in 4 billion). That is just the way checksum works.CODO ERGO SUM |
 |
|
timmy
Master Smack Fu Yak Hacker
1242 Posts |
Posted - 2006-08-11 : 01:00:30
|
Michael, The code was just to illustrate the point. I'm creating triggers that use the BINARY_CHECKSUM value as a means of iterating through the records in the deleted virtual table. But the following experiment is worrying:SELECT COUNT(*) FROM myTable -- returns 2494 recordsSELECT DISTINCT BINARY_CHECKSUM(field1, field2, field3,......) FROM myTable -- returns 2493 records and if I do the same thing with just the columns in the primary key:SELECT field1, field2, field3, field4, field5, field6, field7 FROM myTable -- returns 2494 recordsSELECT DISTINCT BINARY_CHECKSUM(field1, field2, field3, field4, field5, field6, field7) FROM myTable -- returns 2482 records |
 |
|
Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)
7020 Posts |
Posted - 2006-08-11 : 07:01:42
|
Can you post an example of several of the primary keys that are returning the same BINARY_CHECKSUM?CODO ERGO SUM |
 |
|
Arnold Fribble
Yak-finder General
1961 Posts |
Posted - 2006-08-11 : 08:58:47
|
[code]SELECT a, BINARY_CHECKSUM(*)FROM ( SELECT CAST(1 AS bigint) AS a UNION ALL SELECT CAST(4294967296 AS bigint) ) AS A[/code] |
 |
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2006-08-11 : 09:29:43
|
quote: I'm creating triggers that use the BINARY_CHECKSUM value as a means of iterating through the records in the deleted virtual table
What's the rationale for doing that? Can you post the code of your trigger and explain what you're trying to do? BINARY_CHECKSUM is not intended to uniquely identify a combination of values, it's meant to be a fast way to check whether a value or values have changed. |
 |
|
timmy
Master Smack Fu Yak Hacker
1242 Posts |
Posted - 2006-08-13 : 18:03:32
|
Thanks all for your input. I'll try and explain the situation a bit better. I'm building a series of triggers. They will be automatically generated by a script during db upgrade, but that's not the issue. Inside the trigger I have a WHILE loop that iterates through all values in the 'inserted' and 'deleted' virtual tables. The WHILE loop is there instead of using cursors/temp tables/table variables. The basic code goes something like this: WHILE EXISTS(SELECT NULL FROM inserted WHERE BINARY_CHECKSUM(keyfields....) > @keys) BEGIN SELECT TOP 1 @keys = BINARY_CHECKSUM(keyfields....) FROM inserted WHERE BINARY_CHECKSUM(keyfields....) > @keys ORDER BY BINARY_CHECKSUM(keyfields....) ASC PRINT 'Adding records for keys = ' + CONVERT(VARCHAR, @keys) -- Execute store proc to insert audit parent record -- INSERT statements to insert primary key values -- INSERT statements to insert values END So I'm basically using BINARY_CHECKSUM to iterate through all records in my virtual tables. Ideally I'd use a single key for this, but some of my tables are using primary keys of up to 8 columns. I suppose I could just concatenate all the values into a delimited string and use that for comparison but it would get nasty with 8 key columns of varying data types.Or I could just go back to using table variables for this, but I'm going for efficiency over anything else at the moment. The fact that these triggers are auto-generated alleviates some of the pressure to make nicely maintainable code (i.e. it doesn't matter how nasty the code looks, it won't get that much attention anyway).Any suggestions, as always, would be appreciated.Cheers,Tim |
 |
|
eyechart
Master Smack Fu Yak Hacker
3575 Posts |
Posted - 2006-08-13 : 18:10:15
|
why not just throw in an identity column instead? keep the PK on your 7 columns, put a UK on the identity and you are done. -ec |
 |
|
nr
SQLTeam MVY
12543 Posts |
Posted - 2006-08-13 : 18:10:20
|
Insert the PK columns into a temp table with an identity and use the identity for the loop.==========================================Cursors are useful if you don't know sql.DTS can be used in a similar way.Beer is not cold and it isn't fizzy. |
 |
|
eyechart
Master Smack Fu Yak Hacker
3575 Posts |
Posted - 2006-08-13 : 18:11:16
|
quote: Originally posted by nr Insert the PK columns into a temp table with an identity and use the identity for the loop.
even better idea.-ec |
 |
|
timmy
Master Smack Fu Yak Hacker
1242 Posts |
Posted - 2006-08-13 : 19:37:44
|
I wanted to keep away from temp tables and the like if possible (mainly from a performance perspective).Or wouldn't it make much difference? The triggers will mainly be fired when one row is updated/inserted/deleted because that's the way the app is designed. I don't really want to get into temp tables etc for the odd data change that is multi-row.The existing triggers on these tables used temp tables, system tables (sysobjects etc), and dynamic sql. Part of the reason for the re-design is to get rid of these sorts of dependencies. But if temp tables don't represent a performance hit I might consider them. |
 |
|
nr
SQLTeam MVY
12543 Posts |
Posted - 2006-08-13 : 19:44:36
|
Do you need to process it row by row?==========================================Cursors are useful if you don't know sql.DTS can be used in a similar way.Beer is not cold and it isn't fizzy. |
 |
|
timmy
Master Smack Fu Yak Hacker
1242 Posts |
Posted - 2006-08-13 : 19:55:51
|
Unfortunately yes. We have one audit table (or 3 to be precise...), so we basically need to write a row for each column that's changed. |
 |
|
nr
SQLTeam MVY
12543 Posts |
|
timmy
Master Smack Fu Yak Hacker
1242 Posts |
Posted - 2006-08-13 : 21:33:30
|
I did consider this option, but I didn't really like the concatenated key bit. And the I/S view references and dynamic sql would be a bit of a performance hit that I really don't need. |
 |
|
nr
SQLTeam MVY
12543 Posts |
Posted - 2006-08-13 : 22:07:29
|
If you have generic audit trail tables then you have to concatenate the PK columns unless you have columns available to the max number in your database.The dynamic sql is only because this is a single trigger for all tables. If you are generating the triggers for each table then it doesn't need dynamic sql.This generates a trigger from the table definition - you would want a combination of both probably.http://www.nigelrivett.net/SQLTriggers/GenerateTriggerForAudit.html==========================================Cursors are useful if you don't know sql.DTS can be used in a similar way.Beer is not cold and it isn't fizzy. |
 |
|
blindman
Master Smack Fu Yak Hacker
2365 Posts |
Posted - 2006-08-14 : 00:19:47
|
quote: Originally posted by Michael Valentine Jones There is also a small, but real chance that 2 different keys can have the same checksum (about 1 chance in 4 billion).
1 in 4 billion? That depends completely upon the nature of the data. Transpositions will lead to identical checksum values, and transpositions would seem to be a real possibility where a composite key is being used. |
 |
|
Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)
7020 Posts |
Posted - 2006-08-14 : 07:48:07
|
quote: Originally posted by blindman
quote: Originally posted by Michael Valentine Jones There is also a small, but real chance that 2 different keys can have the same checksum (about 1 chance in 4 billion).
1 in 4 billion? That depends completely upon the nature of the data. Transpositions will lead to identical checksum values, and transpositions would seem to be a real possibility where a composite key is being used.
Do you have any example of that?I tried some simple examples ans didn't see it.select a=binary_checksum( 'a','b')select b=binary_checksum( 'b','a')select a1=binary_checksum( 1,2)select a2=binary_checksum( 2,1) Results:a ----------- 1650(1 row(s) affected)b ----------- 1601(1 row(s) affected)a1 ----------- 18(1 row(s) affected)a2 ----------- 33(1 row(s) affected) CODO ERGO SUM |
 |
|
blindman
Master Smack Fu Yak Hacker
2365 Posts |
Posted - 2006-08-14 : 10:54:49
|
I stand corrected.I have run into some cases where binary_checksum has returned duplicate values, though. When I get back to my office I'll see if I can dig them up. |
 |
|
blindman
Master Smack Fu Yak Hacker
2365 Posts |
Posted - 2006-08-14 : 11:01:42
|
Here's an example of two simple strings returning the same binary_checksum value:select binary_checksum('bQ')select binary_checksum('aa') I just think 1 in 4 billion is too general a statement to make. |
 |
|
eyechart
Master Smack Fu Yak Hacker
3575 Posts |
Posted - 2006-08-14 : 11:03:07
|
quote: Originally posted by blindman I stand corrected.I have run into some cases where binary_checksum has returned duplicate values, though. When I get back to my office I'll see if I can dig them up.
This is the example that I have seen showing binary_checksum returning dupe values.SELECT BINARY_CHECKSUM('ABA'), BINARY_CHECKSUM('ACQ') ----------- ----------- 17761 17761(1 row(s) affected) -ec |
 |
|
Next Page
|