Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
yosiasz
Master Smack Fu Yak Hacker
1635 Posts |
Posted - 2012-07-27 : 11:49:56
|
Greetings,I have read up that via c# code you can read image file into a byte array and save that data in SQL Server. Could I then use that byte array to validate duplicate images so as not to ingest dups? Thanks<><><><><><><><><><><><><><><><><>If you don't have the passion to help people, you have no passion |
|
Bustaz Kool
Master Smack Fu Yak Hacker
1834 Posts |
Posted - 2012-07-27 : 18:48:37
|
In order to enforce uniqueness you would be limited to a max length of 900 bytes. The IMAGE datatype would not be allowed since it is a BLOB. If your images were small (likely? perhaps not), you could do this with a different datatype, such as varbinary, but larger data would fail.=================================================Show me a sane man and I will cure him for you. -Carl Jung, psychiatrist (1875-1961) |
 |
|
LoztInSpace
Aged Yak Warrior
940 Posts |
Posted - 2012-07-28 : 02:35:00
|
You can calculate a checksum or hash of the image and do a quick & easy compare of that. You still need to cater for clashes but statistically that's unlikely if you choose the right algorithm. |
 |
|
yosiasz
Master Smack Fu Yak Hacker
1635 Posts |
Posted - 2012-07-30 : 17:19:37
|
thank you for your feedback, maybe sql is not a place for this dup verfication since the images we ingest could vary in size from simple to large high end pictures.<><><><><><><><><><><><><><><><><>If you don't have the passion to help people, you have no passion |
 |
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2012-07-30 : 17:38:13
|
Yeah, definitely use a hashing utility on the image files. Some native ports are available here:http://getgnuwin32.sourceforge.net/http://unxutils.sourceforge.net/MD5 and SHA are available. I used them to look for duplicate files. I imported the file name, size and hash values into a table and then looked for duplicates based on size and hash. |
 |
|
yosiasz
Master Smack Fu Yak Hacker
1635 Posts |
Posted - 2012-07-31 : 11:59:45
|
awesome thanks Rob. Maybe I can plug in this hashing utility to my ssis package and save the file name, size and hash value . every ingest I will verify it does not already exist, flag it then have a human check to see if it is really a dup if so handle accordingly because we have metadata associated with each image and it could be that the metadata is also messed up.<><><><><><><><><><><><><><><><><>If you don't have the passion to help people, you have no passion |
 |
|
|
|
|
|
|