Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 using byte array to look for duplicate images

Author  Topic 

yosiasz
Master Smack Fu Yak Hacker

1635 Posts

Posted - 2012-07-27 : 11:49:56
Greetings,

I have read up that via c# code you can read image file into a byte array and save that data in SQL Server. Could I then use that byte array to validate duplicate images so as not to ingest dups?

Thanks

<><><><><><><><><><><><><><><><><>
If you don't have the passion to help people, you have no passion

Bustaz Kool
Master Smack Fu Yak Hacker

1834 Posts

Posted - 2012-07-27 : 18:48:37
In order to enforce uniqueness you would be limited to a max length of 900 bytes. The IMAGE datatype would not be allowed since it is a BLOB. If your images were small (likely? perhaps not), you could do this with a different datatype, such as varbinary, but larger data would fail.

=================================================
Show me a sane man and I will cure him for you. -Carl Jung, psychiatrist (1875-1961)
Go to Top of Page

LoztInSpace
Aged Yak Warrior

940 Posts

Posted - 2012-07-28 : 02:35:00
You can calculate a checksum or hash of the image and do a quick & easy compare of that. You still need to cater for clashes but statistically that's unlikely if you choose the right algorithm.
Go to Top of Page

yosiasz
Master Smack Fu Yak Hacker

1635 Posts

Posted - 2012-07-30 : 17:19:37
thank you for your feedback, maybe sql is not a place for this dup verfication since the images we ingest could vary in size from simple to large high end pictures.

<><><><><><><><><><><><><><><><><>
If you don't have the passion to help people, you have no passion
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2012-07-30 : 17:38:13
Yeah, definitely use a hashing utility on the image files. Some native ports are available here:

http://getgnuwin32.sourceforge.net/
http://unxutils.sourceforge.net/

MD5 and SHA are available. I used them to look for duplicate files. I imported the file name, size and hash values into a table and then looked for duplicates based on size and hash.
Go to Top of Page

yosiasz
Master Smack Fu Yak Hacker

1635 Posts

Posted - 2012-07-31 : 11:59:45
awesome thanks Rob. Maybe I can plug in this hashing utility to my ssis package and save the file name, size and hash value . every ingest I will verify it does not already exist, flag it then have a human check to see if it is really a dup if so handle accordingly because we have metadata associated with each image and it could be that the metadata is also messed up.

<><><><><><><><><><><><><><><><><>
If you don't have the passion to help people, you have no passion
Go to Top of Page
   

- Advertisement -