Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 Transact-SQL (2000)
 how find similar records in database

Author  Topic 

TomekK
Starting Member

3 Posts

Posted - 2005-07-29 : 10:43:37
hi
im new here

i want to create some appcation which helps me with my mp3 files database
now i have code which insert all files name from catalog to database
table has name Files
and columns are

ID ( file id - unique )
fileCreationTime ( time of file creation)
fileSize ( filesize)
fileName ( filename)
fileInsertedBy (forein key to user)
fileIndertDate ( date.now)

all of this information are inserting automatic from catalog which has mp3 files
and now
i want to find duplicate of files
but name of dulicate files are other...
forexample

fitalic - other.mp3
fitalic-other.mp3
fitalic_-_other.mp3
01fitalic-other.mp3
01fitalic - other.mp3
02_fitalic_-_other.mp3
fitalic_-_other.mp3

all names of this files are duplicated for me

i have Regular Expression which describe filename

[0-9]+( |_|)+[a-zA-Z]+( |_)+(-)(((_| )+[a-zA-Z]+)+)\.(mp3)


when files are inserting to database from catalog
i want to see duplicated file

my ideas are
check string after producer name
in this case (fitalic - other.mp3) it is "other.mp3"
if they are similar -> check producer name
if producer name & title are similar
it means that file is duplicated
other idea
split producer name and title
table will be
( ID
ProducerName
Title
)

i still don't know how to find duplicate.....
maby some one help me...
thanks in advance

jen
Master Smack Fu Yak Hacker

4110 Posts

Posted - 2005-07-30 : 11:02:29
splitting the data into producer and title will probably be the better solution, not to mention normalized. You can then use

select producer,title,... from mytable where
title in (select title from myTable
group by title where count(*)>1)

the result will show records which have occured more than once

HTH

--------------------
keeping it simple...
Go to Top of Page

TomekK
Starting Member

3 Posts

Posted - 2005-08-01 : 04:13:07
hi
your query will be good (i think) only when i split producer and title ?
when i use this for query bellow

select filename from myTable where filename in(select filename from myTable group by filename)

he gives me all records from table

and when i add "...where count(*)>1"
sql server gives me some error

thanks
Go to Top of Page

TomekK
Starting Member

3 Posts

Posted - 2005-08-01 : 04:22:51
and when i split producer name and title
and use this query

select producer,title from Files where title in(select title from Files group by title)
it's work but he gives me all records... not similar....

"where" still doesnt work
Go to Top of Page
   

- Advertisement -