Many to Many - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

SQL Server Development (2000)

Many to Many

Author

Topic

X-Factor
Constraint Violating Yak Guru

392 Posts

Posted - 2004-02-25 : 17:59:08

Hi there,

I'm interested to know what people think of this database design.

Bascially, there's a table of products and a product can have 0 to many attributes. Examples of attributes are color and size of a shoe.

Furthermore, attibutes have 0 to many options. Examples of options are 2 and 4 for a shoe size or red and green for shoe color.

This is all quite straight forward. Three tables. Options reference attributes and attributes reference products and in a web page, a product detail has a drop down list for each attribute which the user selects from before adding the product to the cart.

However, a product can be priced by a combination of its attributes. Thus a red/small shoe might be $4 whilst a green/large shoe might be $6.

Therefore, there is a 'Combinations' table to hold the price of the combination.

I figure that there is a many-many relationship between the options table and the Combinations table as each option can appear in multiple combinations and each combination is comprised of multiple options, thus there is a join table between the two. This table has two columns, the pk of an option and the pk of a combination.

Now, the main issue is, how does one look up a combination given some options?

The proper way to do this, I figure, is that one does as many self joins on the join table as there are attributes and looks for the combination that contains all the option values.

Thus, for two attributes, where 67 identifies an option, say, 'red' and 23 identifies another option, say, 'small', the query would look lke this..

SELECT * FROM Combos c

INNER JOIN options_Combos oc1
ON c.ComboID = oc1.ComboID
AND oc1.optionID = 67

INNER JOIN options_Combos oc2
ON oc1.ComboID = oc2.ComboID
AND oc2.optionID = 23

Because the number of self joins required depends on the number of attributes a product has, I figure the query would need to be dynamic SQL and use an 'Execute' statement on the resultant string. I thought I could do it more efficiently.

So I added a column to the Combinations table called 'IDStr'. A typical value in this filed might be '67_23' .

Then, as long as the options are well ordered (which requires an extra field), one can just concatenate the options together and do a simple lookup in the combinations table for the idStr.

Like this... SELECT * FROM Combos WHERE idStr = '67_23'

However, it occurs to me that the 'idStr' field stops the table from being in 1st normal form.

Any thoughts?

I hope this is clear.

X.

byrmol
Shed Building SQL Farmer

1591 Posts

Posted - 2004-02-25 : 18:27:39

x,

No it is not clear.
What is the structure of the Combinations table?
Can you give us your current schema (Just the Keys.. CREATE TABLE).
It could be just a division query..

DavidM

"SQL-3 is an abomination.."

X-Factor
Constraint Violating Yak Guru

392 Posts

Posted - 2004-02-25 : 21:37:17

OK, here are the tables and the important fields...

Products
--------
productID pk
name

Attributes
----------
attributeID int pk
productID int fk
name varchar

Options
-------
optionID int pk
attributeID int fk
name varchar

Options_Combinations
--------------------
optionID int fk
comboID int fk

Combinations
------------
comboID int pk
idStr varchar
price money

e.g.

products
--------
1, 'shoe'

attributes
----------
1, 1, 'color'
2, 1, 'size'

options
-------
1, 1, 'red'
2, 1, 'green'
3, 2, '2'
4, 2, '3'

Options_Combinations
--------------------
1,1
3,1
1,2
4,2
2,3
3,3
2,4
4,4

Combinations
------------
1, '1_3' ,£2 (red, 2)
2, '1_4' ,£3 (red, 3)
3, '2_3' ,£4 (green, 2)
4, '2_4' ,£5 (green, 3)

So at the browser, when viewing this shoe, the client has 2 drop downs. One contains colors and the other contains sizes.

So when the 'add to cart' button is pressed, you have two option IDs and you need to look up the correct combination to get the right price!

byrmol
Shed Building SQL Farmer

1591 Posts

Posted - 2004-02-25 : 22:13:00

X,

You are after division...

Drop that horrible multivalue column from Combinations, read up on CSV parsing for the IN expression you are going to have to build and take a look at this query..


SELECT C.ComboID, MAX(Price) AS Price
FROM Combinations C
INNER JOIN Options_Combinations OC ON OC.ComboID = C.ComboID
WHERE OptionID IN (2,3)
GROUP BY C.ComboID
HAVING COUNT(*) = (SELECT COUNT(DISTINCT OptionID) FROM Options_Combinations WHERE OptionID IN (2,3))

This will allow you to have more than 2 options per combination if the case arises...

If you are only using 2 values then you can OR the IN....and forget the CSV stuff.....


SELECT C.ComboID, MAX(Price)
FROM Combinations C
INNER JOIN Options_Combinations OC ON OC.ComboID = C.ComboID
WHERE OptionID = 2 OR OptionID = 3
GROUP BY C.ComboID
HAVING COUNT(*) = 2

And here is a link for Relational Division the SQL way...
A Popular post this week...[url]http://www.dbazine.com/celko1.html[/url]

DavidM

"SQL-3 is an abomination.."

X-Factor
Constraint Violating Yak Guru

392 Posts

Posted - 2004-02-26 : 11:56:19

Thanks for your reply.

I think I see how that works.

However, given the amount of processing that goes into preparing the CSV - concatenating the values in the app, splitting them apart in the sp & putting them into a table and then the complexity of the final query, do you not think that the 'idStr' field is warranted in this case?

Why is it so horrible when its so much more efficient? Couldn't this be a case where denormalization is ok?

X.

ajthepoolman
Constraint Violating Yak Guru

384 Posts

Posted - 2004-02-26 : 12:09:21

I would only see the denomalization if you are going to say that a red/small shoe is a different product than a red/large shoe. But I think you are saying that a shoe is a shoe regardless of what color/size it is.

Aj

robvolk
Most Valuable Yak

15732 Posts

Posted - 2004-02-26 : 20:52:34

quote:
Why is it so horrible when its so much more efficient? Couldn't this be a case where denormalization is ok?

It isn't more efficient, not by a long shot.

By combining the attributes into a CSV string, you cannot index them indepdently of one another. If someone is looking for a red shoe, and you have none at all in the inventory, the optimizer would still have to table scan, and compare the CSV for every row to determine that there are none. It's about as slow an operation as you can do in SQL Server without using a cursor.

Keeping each attribute on a separate row lets the indexing work more effectively. As a single indexed column, it can instead seek for "red" and return almost immediately that there are none. This applies to any attribute, not just color. It will always be easier to find a single needle laying by itself than one buried in a haystack.

X-Factor
Constraint Violating Yak Guru

392 Posts

Posted - 2004-02-26 : 21:36:43

I don't understand the relavance of a user looking for a red shoe in this context.

When I say its more efficient, I mean given a set of option IDs which map to a unique combination, its easier to look up the combination by concatenating the option IDs and running the following...

SELECT price FROM Combinations WHERE idStr = '2_3'

than concatenating the option ID's, then breaking them apart in a sp and loading them into a table and then running a query like this...

SELECT C.ComboID, MAX(Price) AS Price
FROM Combinations C
INNER JOIN Options_Combinations OC ON OC.ComboID = C.ComboID
WHERE OptionID IN (2,3)
GROUP BY C.ComboID
HAVING COUNT(*) = (SELECT COUNT(DISTINCT OptionID) FROM Options_Combinations WHERE OptionID IN (2,3))

If idStr is indexed, then surely the first query is one of the quicker operations done in SQL Server?

robvolk
Most Valuable Yak

15732 Posts

Posted - 2004-02-27 : 08:20:17

What if the attributes get stored as 3_2? Same shoe, different order of attributes, but now the query returns nothing. You'd have to write logic into your application to ensure this doesn't happen, and it can still happen if updates are made outside of the app.

Storing them on separate rows, the second query will always work no matter what order the attributes are entered. That's the beauty of a relational data structure, order does not matter and has no meaning.

X-Factor
Constraint Violating Yak Guru

392 Posts

Posted - 2004-02-27 : 13:48:23

Hi there robvolk, yes I agree.

At the moment, I have an 'ordinal' column of type int in the attributes table.

Its not that difficult to maintain the idStr column because, in the app, its not possible to change the order of attributes once a product has been created. If you add a new attribute or delete an old one then the current combinations are rendered invalid anyway thus are deleted with a trigger and completely replaced.

But next time I'll probably try the proper relational method.

Speaking of order significant multivalue columns, in this article http://www.sqlteam.com/item.asp?ItemID=8866 , isn't the 'lineage' column order significant & multivalue? If so, why is it ok in this instance?

(I didn't know you were the author of this article when it first occured to me to reference it!)

Here's an example of one of the products at the site I'm talking about...

http://www.mutr.co.uk/prodDetail.aspx?prodID=531

robvolk
Most Valuable Yak

15732 Posts

Posted - 2004-02-28 : 16:05:33

Oooooh, colors.

While the lineage column is multivalue (sort of; in a way it really isn't) and order is signicant, it actually describes a hierarchy with precedence and dependence. Changing the order of the node values actually changes the hierarchy. It's not just a reordering the same attributes that describe an object. Still, it is a workaround and not necessarily the best way to store that kind of relationship (see below)

Whereas in the structure you have, it shouldn't make a difference to say "Green 4mm" or "4mm Green", it's the same combination. But storing them as a single combined attribute no longer makes them equivalent. Each attribute describes a different thing and is independent of the other. If you think about it, ordering attributes in a CSV is akin to numbering the columns of a table, instead of naming them. Moving a column around would require moving all of the data in it to its new column position. For that reason, having a trigger or some other mechanism that changes or maintains an order of attributes should be unnecessary, and IS unnecessary if they are stored separately.

There are other ways to model a heirarchy, I used the lineage method because the nested set model that Joe Celko uses was frustrating for the data I had to store. I COULD'VE made it work, but it was much harder. Nested sets are more "relational" in their structure and should be considered for storing hierarchical data.

X-Factor
Constraint Violating Yak Guru

392 Posts

Posted - 2004-03-01 : 11:28:48

Thanks for your reply. I think I see the disinction you're making.

However, the trigger I referred to is handy regardless of whether the 'idStr' column exists. For example, if there's a 'big/red' shoe combination and an admin adds a new attribute to the shoe, say, men's or women's, then you can no longer buy a 'big/red' shoe. You can buy a 'big/red/mens' shoe but the point is that the original set of combinations is invalid if you add or subtract a product attribute.

This just happens to be convenient if one is maintaining the 'idStr' column because there's no need to update it if a new attribute is added or removed because all the combinations need to be replaced anyway.

In fact, I have a suspicion that maybe these product combinations shouldn't be associated with a product using a many-many relationship because the combinations directly derive from the attributes. They are the combinations of the attributes and thus are dependent on them. Do you think there's a better way of modeling this?

Also, is there any chance of you writing part 2 of that article? Perhaps you could discuss when to use the method you used instead of the nested set solution?

There's another design issue I would welcome some feedback on from anyone who cares to comment....

Basically, a product can have different price styles. It can be given a single price (in which case the price is stored in the main product row), be priced by combination, as discussed in this thread, (if there are any combinations for the product) or it can be priced using price bands ( in other words, the price you pay depends on the quantity of the item you are buying).

In order to know the price style of a product, there's a field in the main product row called 'PriceStyleID' which is an integer which references a price style. Thus, along with the tables mentioned earlier in this thread, here are some more key tables...

Products
--------
productID int pk
PriceStyleID int fk
price money
name varchar

PriceBands
----------
priceBandID int pk
productID int fk
lowerBound int
price money

PriceStyles
-----------
PriceStyleID int pk
name varchar

ShoppingCart
------------
cartItemID int pk
productID int fk
comboID int fk
quantity int

So, if you have a product in a shopping cart and you want to look up the product's price then one might join the following tables and, based on the value of the 'priceStyleID' field of the product, look up the price in the correct place...

priceStyle = 1 - single price
priceStyle = 2 - priced by price bands
priceStyle = 3 - price by combination.....

SELECT
case
when p.priceStyle = 1 then p.price
when p.priceStyle = 2 then pb.price
else pC.price
end

FROM
ShoppingCart c

INNER JOIN
Products p ON p.productID = c.productID

LEFT JOIN
ProductCombos pC ON c.comboID = pC.productComboID

LEFT JOIN
productPriceBands pb ON
pb.productID= p.productID
and pb.lowerbound =
(SELECT
max(pb2.lowerbound)
FROM
productpricebands pb2
WHERE
pb2.productID= pb.productID AND
c.quantity >= pb2.lowerbound
)

WHERE
cartItemID = @cartItemID

Thus, what I am interested in is the appropriateness of the 'priceStyleID' field in the 'products' table being used to determine where to find the price of a product which is potentially spread across several tables.

Is there a better way of doing this?

X.

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources