Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
SQLTim
Starting Member
21 Posts |
Posted - 2006-04-10 : 18:37:54
|
| This has to do with a 2D matrix.The underlying table has two records for each cell.ItemID Attribute3 XL3 Red4 L4 Blue5 L5 Red6 L6 BlueThe last pair of records are a duplicate and should not exist. How would I write a select that would give me a list of the record pairs that are duplicated in this way?The result would be two records:ItemID46Any ideas?Thanks in advance!!Tim |
|
|
khtan
In (Som, Ni, Yak)
17689 Posts |
Posted - 2006-04-10 : 19:12:45
|
Is there a 3rd column to identify what is the Attribute (Color or Size) ? KH |
 |
|
|
nathans
Aged Yak Warrior
938 Posts |
Posted - 2006-04-10 : 19:27:40
|
right, AttributeTypeID would be useful. Tim, take a look at this query and see if you can get at the info youre seeking.declare @Matrix table (ItemID int, AttributeTypeID int, AttributeValue varchar(10))insert into @Matrix select 3, 1, 'XL' union select 3, 2, 'Red' union select 4, 1, 'L' union select 4, 2, 'Blue' union select 5, 1, 'L' union select 5, 2, 'Red' union select 6, 1, 'L' union select 6, 2, 'Blue'select ItemID, max(case when AttributeTypeID = 1 then AttributeValue else NULL end) as 'ItemSize', max(case when AttributeTypeID = 2 then AttributeValue else NULL end) as 'ItemColor'from @Matrixgroup by ItemID Nathan Skerl |
 |
|
|
SQLTim
Starting Member
21 Posts |
Posted - 2006-04-10 : 19:28:08
|
| I tried to simplify this. The second column actually points to a template table that can have many combinations of attributes.Basically, not all 'XL's are the same.It is really the more than one pair issue I'm going after.I did try a solution where (heavily pseudo coded):select ItemID, (select top 1 attrib)+(select top 1 attrib desc) from myTable group by 1,2 having count(*)>1I used correlated queries to do it, and the results were close, but not exact.It seems like there is another way to do something like this.I'll post the query here but I fear it will distract from my main goal:select department,category,item, match from (Select department,category,item, (select top 1 t.value_name from i_attlnk a join i_tmplat t on a.tmplat_id = t.tmplat_id where i.invent_id = a.invent_id -- look at the top of the list and atname_id in (select top 1 atname_id from i_attlnk a join i_tmplat t on a.tmplat_id = t.tmplat_id where a.invent_id = i.invent_id order by atname_id))+(select top 1 t.value_name from i_attlnk a join i_tmplat t on a.tmplat_id = t.tmplat_id where i.invent_id = a.invent_id -- now look at the bottom of the list and atname_id in (select top 1 atname_id from i_attlnk a join i_tmplat t on a.tmplat_id = t.tmplat_id where a.invent_id = i.invent_id order by atname_id desc)) as match from i_items i) as qqgroup by department,category,item, matchhaving count(*)>1Thanks for looking at this!! |
 |
|
|
pootle_flump
1064 Posts |
Posted - 2006-04-10 : 19:44:14
|
Without attribute types:DECLARE @Matrix TABLE (ItemID Int, AttributeValue VarChar(10))INSERT INTO @Matrix SELECT 3, 'XL' UNION SELECT 3, 'Red' UNION SELECT 4, 'L' UNION SELECT 4, 'Blue' UNION SELECT 5, 'L' UNION SELECT 5, 'Red' UNION SELECT 6, 'L' UNION SELECT 6, 'Blue'SELECT A.ItemID AS DupesFROM (SELECT ItemID, MAX(AttributeValue) AS AttribA, MIN(AttributeValue) AS AttribB FROM @Matrix GROUP BY ItemID) A INNER JOIN (SELECT ItemID, MAX(AttributeValue) AS AttribA, MIN(AttributeValue) AS AttribB FROM @Matrix GROUP BY ItemID) B ON A.AttribA = B.AttribA AND A.AttribB = B.AttribBWHERE A.ItemID <> B.ItemID Dupes-----64(2 row(s) affected) Although your last post hints that this isn't what you are after afterall... |
 |
|
|
SQLTim
Starting Member
21 Posts |
Posted - 2006-04-10 : 20:30:26
|
| pootle_flump, your results would be precisely what I am looking for.Will let you know if it works out. |
 |
|
|
SQLTim
Starting Member
21 Posts |
Posted - 2006-04-10 : 20:43:54
|
| OK, let me try a version closer to the "truth"It is essentially the same as before:PriKey AttrID ItemID6706 6722 33546707 6721 33546708 6722 33556709 6718 33556710 6722 33566711 6718 33566712 6722 33576713 6720 335712956 7608 648112957 6722 6481Only the second and third columns really matter here. Can you see how 3355 and 3356 are the duplicate pairs of records?The Query:SELECT A.invent_id AS DupesFROM (SELECT invent_id, MAX(tmplat_id) AS AttribA, MIN(tmplat_id) AS AttribB FROM i_attlnk GROUP BY invent_id) A INNER JOIN (SELECT invent_id, MAX(tmplat_id) AS AttribA, MIN(tmplat_id) AS AttribB FROM i_attlnk GROUP BY invent_id) B ON A.AttribA = B.AttribA AND A.AttribB = B.AttribB join i_items i on i.invent_id=a.invent_idWHERE A.invent_id <> B.invent_idgave me 56 records. and not two. Ideas? |
 |
|
|
khtan
In (Som, Ni, Yak)
17689 Posts |
Posted - 2006-04-10 : 22:01:11
|
Try this codedeclare @table table( PriKey int, AttrID int, ItemID int)declare @DupAttr table( AttrID int)declare @DupItem table( ItemID int)insert into @tableselect 6706, 6722, 3354 union allselect 6707, 6721, 3354 union allselect 6708, 6722, 3355 union allselect 6709, 6718, 3355 union allselect 6710, 6722, 3356 union allselect 6711, 6718, 3356 union allselect 6712, 6722, 3357 union allselect 6713, 6720, 3357 union allselect 12956, 7608, 6481 union allselect 12957, 6722, 6481insert into @DupAttr select AttrID from @table group by AttrID having count(*) > 1insert into @DupItem select ItemID from @table t inner join @DupAttr d on t.AttrID = d.AttrID group by ItemID having count(*) > 1select t.*from @table t inner join @DupAttr a on t.AttrID = a.AttrID inner join @DupItem i on t.ItemID = i.ItemID KH |
 |
|
|
RyanRandall
Master Smack Fu Yak Hacker
1074 Posts |
Posted - 2006-04-11 : 05:58:10
|
Here's another possible way (if you can accept the tiny chance that it will return an incorrect result)...--datadeclare @table table( PriKey int, AttrID int, ItemID int)insert into @tableselect 6706, 6722, 3354 union allselect 6707, 6721, 3354 union allselect 6708, 6722, 3355 union allselect 6709, 6718, 3355 union allselect 6710, 6722, 3356 union allselect 6711, 6718, 3356 union allselect 1, 6722, 3358 union allselect 1, 6718, 3358 union allselect 1, 7608, 3358 union allselect 6712, 6722, 3357 union allselect 6713, 6720, 3357 union allselect 12956, 7608, 6481 union allselect 12957, 6722, 6481--calculationdeclare @SummaryTable table (ItemId int, AggregatedAttrID float)insert @SummaryTable select ItemId, checksum_agg(AttrID) + sum(log(AttrID + 0.5)) from @table group by ItemIdselect * from @SummaryTable where AggregatedAttrID in ( select AggregatedAttrID from @SummaryTable group by AggregatedAttrID having count(ItemId) > 1) It would be interesting to figure out how tiny the chance of incorrect results really is, and interesting to then figure out a way to set up a combination of aggregation functions to minimise that chance. Something for another time, methinks...Ryan Randallwww.monsoonmalabar.com London-based IT consultancy Solutions are easy. Understanding the problem, now, that's the hard part. |
 |
|
|
SQLTim
Starting Member
21 Posts |
Posted - 2006-04-11 : 09:44:38
|
| It looks like it is close. Was hoping I could adapt it to my situ.I'm going to need to expand what is going on here one more step.I've changed some field names around a bit to accomodate.There is an i_items table that holds one record for each item.The i_attlnk table holds the matrix linking information.So I'm looking for pair duplicates inside of any one item...in this case, any one Item_Id.From previous examples:AttrID is now tmplat_IdItemID is now invent_idand Item_ID represents an item that has many colors and sizes.So a record in the i_items table uses two records in the i_attlnk table to tell it what color and size it is. I've discovered that some application is creating duplicate record pairs and need to reslove this quickly as many of our clients are doing their physical inventories now. What follows is an example of two items that each have the problem of duplicate record pairs. I've adapted your approach and it works for this record set, but it does not work when I turn it loose on the whole data set. I don't quite get what your approach is and would love to hear your thoughts re:checksum_agg(tmplat_Id) + sum(log(tmplat_Id + 0.5)It's facinating tact. I think if I understood it better I might be able to adapt it for this situ more correctly. Any ideas what I did wrong here?--datadeclare @i_items table( Item_Id int, Invent_Id int)insert into @i_items select 12454, 3354 union allselect 12454, 3355 union allselect 12454, 3356 union allselect 12454, 3357 union allselect 12454, 6481 union allselect 3549 , 1692 union allselect 3549 , 1693 union allselect 3549 , 1694 union allselect 3549 , 1695declare @i_attlnk table( tmplat_Id int, invent_Id int)insert into @i_attlnkselect 6722, 3354 union allselect 6721, 3354 union allselect 6722, 3355 union allselect 6718, 3355 union allselect 6722, 3356 union allselect 6718, 3356 union allselect 6722, 3357 union allselect 6720, 3357 union allselect 7608, 6481 union allselect 6722, 6481 union allselect 6588, 1692 union allselect 6585, 1692 union allselect 6588, 1693 union allselect 6586, 1693 union allselect 6588, 1694 union allselect 6587, 1694 union allselect 6588, 1695 union allselect 6585, 1695--calculationdeclare @SummaryTable table (invent_Id int, AggregatedAttrID float)insert @SummaryTable select a.invent_Id, checksum_agg(tmplat_Id) + sum(log(tmplat_Id + 0.5)) from @i_attlnk ajoin @i_items i on i.invent_Id = a.invent_Idgroup by i.Item_Id,a.invent_Idselect * from @SummaryTable where AggregatedAttrID in ( select AggregatedAttrID from @SummaryTable group by AggregatedAttrID having count(invent_Id) > 1)invent_Id AggregatedAttrID ----------- ----------------------------------------------------- 1692 22.5857065254244061695 22.5857065254244063355 141.625835582458483356 141.62583558245848(4 row(s) affected)Thanks! |
 |
|
|
RyanRandall
Master Smack Fu Yak Hacker
1074 Posts |
Posted - 2006-04-11 : 10:26:59
|
Hi Tim,The idea of "checksum_agg(tmplat_Id) + sum(log(tmplat_Id + 0.5)" is just to quickly get some summarising number for each list of numbers. That's then obviously much easier to work with.The ideal method is to come up with something which will never give the same answer for any 2 different lists, and this is just my attempt - off the top of my head. I feel like it's something someone will have investigated at some point, but couldn't find anything from a quick google search.It's hard to help you without seeing a problem. Try to find or construct some data which breaks what you've got, and post that - it will then be much easier to help... Ryan Randallwww.monsoonmalabar.com London-based IT consultancy Solutions are easy. Understanding the problem, now, that's the hard part. |
 |
|
|
SQLTim
Starting Member
21 Posts |
Posted - 2006-04-11 : 10:53:33
|
| Hi Ryan,The post just above your last one does what you ask. It's a new data set that is pretty close to what I'm working with yet it keeps it simple. I think your idea would work well here because it would be working on duplicates within each item, not over all items.Because the set is small, the chances of random duplication is very small.In the above example there are two items really: 3549 and 12545. 3549 has 4 sub items and 12545 has 5.Each of those sub items represent a color size combination. Those are represented by two records in the i_attlnk table.The problem is that I'm finding more than one sub item that has the same attribute pair.I've modified your solution incorrectly to adapt to the new data set. I'm trying to figure out how I've gone astray. I've bolded my major changes...declare @SummaryTable table (Item_Id int, invent_Id int, AggregatedAttrID float)insert @SummaryTable select i.Item_Id, a.invent_Id, checksum_agg(tmplat_Id) + sum(log(tmplat_Id + 0.5)) from @i_attlnk ajoin @i_items i on i.invent_Id = a.invent_Idgroup by i.Item_Id,a.invent_Idselect * from @SummaryTable where AggregatedAttrID in ( select AggregatedAttrID from @SummaryTable group by Item_Id, AggregatedAttrID having count(invent_Id) > 1)yeilding:Item_Id invent_Id AggregatedAttrID ----------- ----------- ----------------------------------------------------- 3549 1692 22.5857065254244063549 1695 22.58570652542440612454 3355 141.6258355824584812454 3356 141.62583558245848Which is what I would be looking for. But it's still not quite right.How would I handle:count(invent_Id) > 1) ?Thank you for your help in this!Tim |
 |
|
|
SQLTim
Starting Member
21 Posts |
Posted - 2006-04-14 : 14:50:56
|
I've got most of the answer!Now I'm missing something I fear is basic.--datadeclare @i_items table(Item_Id int,Invent_Id int)insert into @i_items select 12454, 3354 union allselect 12454, 3355 union allselect 12454, 3356 union allselect 12454, 3357 union allselect 12454, 6481 union allselect 3549 , 1692 union allselect 3549 , 1693 union allselect 3549 , 1694 union allselect 3549 , 1695declare @i_attlnk table(tmplat_Id int,invent_Id int)insert into @i_attlnkselect 6722, 3354 union allselect 6721, 3354 union allselect 6722, 3355 union allselect 6718, 3355 union allselect 6722, 3356 union allselect 6718, 3356 union allselect 6722, 3357 union allselect 6720, 3357 union allselect 7608, 6481 union allselect 6722, 6481 union allselect 6588, 1692 union allselect 6585, 1692 union allselect 6588, 1693 union allselect 6586, 1693 union allselect 6588, 1694 union allselect 6587, 1694 union allselect 6588, 1695 union allselect 6585, 1695Select Item_Id, AttName, count(*)from (select Item_Id, a.invent_Id, CAST(Max(tmplat_Id) as char(10) )+ CAST(Min(tmplat_Id) as char(10)) as AttNamefrom @i_attlnk ajoin @i_items i on i.invent_Id = a.invent_Idgroup by i.Item_Id,a.invent_Id) as qgroup by Item_Id, AttNamehaving count(*)>1Item_Id AttName Count ----------- -------------------- ----------- 3549 6588 6585 212454 6722 6718 2 This is great!Now how would I get a listing of all of these items with their invent_id included?Ideally I'm looking for:Item_ID, invent_Idthat in this case should yeild 4 records.Any ideas?Many Thanks in advance! |
 |
|
|
|
|
|
|
|