Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Development (2000)
 Detecting duplicates... BLOBs.

Author  Topic 

aiken
Aged Yak Warrior

525 Posts

Posted - 2003-09-16 : 17:54:46
Ok, I know all about the detecting duplicate row question and the FAQs and such.

But what I'm after is a way to, with reasonable performance, find duplicate images that are stored in image columns. I think the long-term solution is to store an MD5 hash when the column is created or altered, but given that I don't have that now... any ideas?

Thanks
-b

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2003-09-16 : 17:55:45
I do not have an answer to your question, but...

The long term idea should be to store the image on the file system and just store the path information in the database.

Tara
Go to Top of Page

rjpaulsen
Starting Member

9 Posts

Posted - 2003-09-17 : 14:51:36
Even with storing the images on the file system, you don't know which are duplicates (a.jpg may or may not be b.jpg -- even if size is the same).

Storing a checksum of the image as a seperate field (Regardless of where the image is stored) is the cleanest method.
Go to Top of Page

Stoad
Freaky Yak Linguist

1983 Posts

Posted - 2003-09-21 : 13:25:03
Great.

length
Is an integer that specifies the length of the substring
(the number of characters or bytes to return).

Why not to point out explicitly that length>8000 makes no sense?
Go to Top of Page

nr
SQLTeam MVY

12543 Posts

Posted - 2003-09-21 : 19:26:22
Problem with an image is that they are just binary files and it is not easy to get a small checksum that will detect duplicates reliably. The checksum will be the file compressed.

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

Stoad
Freaky Yak Linguist

1983 Posts

Posted - 2003-09-22 : 02:43:03
create table images (pk int, img image)

This finds duplicates in img field of images table:

declare @pk1 int, @pk2 int, @n int, @i int, @f int
declare abc cursor for
select t1.pk, t2.pk, datalength(t1.img)/8000
from images t1 inner join images t2 on
t1.pk<t2.pk and datalength(t1.img)=datalength(t2.img)
open abc
fetch next from abc into @pk1, @pk2, @n
while @@fetch_status=0
begin
set @f=0 set @i=0
while @i<@n+1
begin
if
(select substring(img,@i*8000+1,8000) from images where pk=@pk1)<>
(select substring(img,@i*8000+1,8000) from images where pk=@pk2)
begin set @f=1 break end
set @i=@i+1
end
if @f=0 select @pk1, @pk2
fetch next from abc into @pk1, @pk2, @n
end
close abc
deallocate abc
Go to Top of Page

Stoad
Freaky Yak Linguist

1983 Posts

Posted - 2003-09-22 : 03:12:42
>checksum that will detect duplicates reliably. The checksum will be the file compressed

Yes, the point is what probability/reliability is enough for someone's requirements.

BTW I populated my test table images by this:

Dim s As New ADODB.Stream, cn As New ADODB.Connection
Dim rs As New ADODB.Recordset

cn.CursorLocation = adUseClient

cn.Open "provider=sqloledb;data source=(local);initial catalog=myDB;" & _
"user id=sa;password=;"

rs.Open "images", cn, adOpenDynamic, adLockOptimistic

s.Mode = adModeReadWrite: s.Type = adTypeBinary: s.Open
s.LoadFromFile "e:\nu.dll": s.Position = 0

rs.AddNew
rs(0) = 1: rs(1) = s.Read(-1)
rs.Update

s.Close
Set s = Nothing
rs.Close
Set rs = Nothing
cn.Close
Set cn = Nothing
Go to Top of Page

Amethystium
Aged Yak Warrior

701 Posts

Posted - 2003-09-22 : 04:47:46
You're a step ahead of the rest Vit. I don't think they know what you're talking about!

Well played though.... well played!

__________________
Make love not war!
Go to Top of Page

Stoad
Freaky Yak Linguist

1983 Posts

Posted - 2003-09-22 : 05:03:19
LOL.
I am sure I look like a mossy monster from the middle of '90s...
_________________________
Make not love nor war!!
Go to Top of Page

aiken
Aged Yak Warrior

525 Posts

Posted - 2003-09-23 : 20:27:47
Thanks for the excellent advice -- that works great for my purposes.

Cheers
-b
Go to Top of Page
   

- Advertisement -