| Author |
Topic |
|
aiken
Aged Yak Warrior
525 Posts |
Posted - 2003-09-16 : 17:54:46
|
| Ok, I know all about the detecting duplicate row question and the FAQs and such.But what I'm after is a way to, with reasonable performance, find duplicate images that are stored in image columns. I think the long-term solution is to store an MD5 hash when the column is created or altered, but given that I don't have that now... any ideas?Thanks-b |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2003-09-16 : 17:55:45
|
| I do not have an answer to your question, but...The long term idea should be to store the image on the file system and just store the path information in the database.Tara |
 |
|
|
rjpaulsen
Starting Member
9 Posts |
Posted - 2003-09-17 : 14:51:36
|
| Even with storing the images on the file system, you don't know which are duplicates (a.jpg may or may not be b.jpg -- even if size is the same).Storing a checksum of the image as a seperate field (Regardless of where the image is stored) is the cleanest method. |
 |
|
|
Stoad
Freaky Yak Linguist
1983 Posts |
Posted - 2003-09-21 : 13:25:03
|
| Great.lengthIs an integer that specifies the length of the substring(the number of characters or bytes to return).Why not to point out explicitly that length>8000 makes no sense? |
 |
|
|
nr
SQLTeam MVY
12543 Posts |
Posted - 2003-09-21 : 19:26:22
|
| Problem with an image is that they are just binary files and it is not easy to get a small checksum that will detect duplicates reliably. The checksum will be the file compressed.==========================================Cursors are useful if you don't know sql.DTS can be used in a similar way.Beer is not cold and it isn't fizzy. |
 |
|
|
Stoad
Freaky Yak Linguist
1983 Posts |
Posted - 2003-09-22 : 02:43:03
|
| create table images (pk int, img image)This finds duplicates in img field of images table:declare @pk1 int, @pk2 int, @n int, @i int, @f intdeclare abc cursor forselect t1.pk, t2.pk, datalength(t1.img)/8000from images t1 inner join images t2 ont1.pk<t2.pk and datalength(t1.img)=datalength(t2.img)open abcfetch next from abc into @pk1, @pk2, @nwhile @@fetch_status=0beginset @f=0 set @i=0while @i<@n+1beginif(select substring(img,@i*8000+1,8000) from images where pk=@pk1)<>(select substring(img,@i*8000+1,8000) from images where pk=@pk2)begin set @f=1 break endset @i=@i+1endif @f=0 select @pk1, @pk2fetch next from abc into @pk1, @pk2, @nendclose abcdeallocate abc |
 |
|
|
Stoad
Freaky Yak Linguist
1983 Posts |
Posted - 2003-09-22 : 03:12:42
|
| >checksum that will detect duplicates reliably. The checksum will be the file compressedYes, the point is what probability/reliability is enough for someone's requirements.BTW I populated my test table images by this:Dim s As New ADODB.Stream, cn As New ADODB.ConnectionDim rs As New ADODB.Recordsetcn.CursorLocation = adUseClientcn.Open "provider=sqloledb;data source=(local);initial catalog=myDB;" & _"user id=sa;password=;"rs.Open "images", cn, adOpenDynamic, adLockOptimistics.Mode = adModeReadWrite: s.Type = adTypeBinary: s.Opens.LoadFromFile "e:\nu.dll": s.Position = 0rs.AddNewrs(0) = 1: rs(1) = s.Read(-1)rs.Updates.CloseSet s = Nothingrs.CloseSet rs = Nothingcn.CloseSet cn = Nothing |
 |
|
|
Amethystium
Aged Yak Warrior
701 Posts |
Posted - 2003-09-22 : 04:47:46
|
| You're a step ahead of the rest Vit. I don't think they know what you're talking about!Well played though.... well played!__________________Make love not war! |
 |
|
|
Stoad
Freaky Yak Linguist
1983 Posts |
Posted - 2003-09-22 : 05:03:19
|
| LOL.I am sure I look like a mossy monster from the middle of '90s..._________________________Make not love nor war!! |
 |
|
|
aiken
Aged Yak Warrior
525 Posts |
Posted - 2003-09-23 : 20:27:47
|
| Thanks for the excellent advice -- that works great for my purposes.Cheers-b |
 |
|
|
|