Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
X-Factor
Constraint Violating Yak Guru
392 Posts |
Posted - 2004-02-25 : 17:59:08
|
| Hi there,I'm interested to know what people think of this database design.Bascially, there's a table of products and a product can have 0 to many attributes. Examples of attributes are color and size of a shoe.Furthermore, attibutes have 0 to many options. Examples of options are 2 and 4 for a shoe size or red and green for shoe color.This is all quite straight forward. Three tables. Options reference attributes and attributes reference products and in a web page, a product detail has a drop down list for each attribute which the user selects from before adding the product to the cart.However, a product can be priced by a combination of its attributes. Thus a red/small shoe might be $4 whilst a green/large shoe might be $6.Therefore, there is a 'Combinations' table to hold the price of the combination.I figure that there is a many-many relationship between the options table and the Combinations table as each option can appear in multiple combinations and each combination is comprised of multiple options, thus there is a join table between the two. This table has two columns, the pk of an option and the pk of a combination.Now, the main issue is, how does one look up a combination given some options?The proper way to do this, I figure, is that one does as many self joins on the join table as there are attributes and looks for the combination that contains all the option values. Thus, for two attributes, where 67 identifies an option, say, 'red' and 23 identifies another option, say, 'small', the query would look lke this..SELECT * FROM Combos cINNER JOIN options_Combos oc1ON c.ComboID = oc1.ComboIDAND oc1.optionID = 67INNER JOIN options_Combos oc2ON oc1.ComboID = oc2.ComboIDAND oc2.optionID = 23Because the number of self joins required depends on the number of attributes a product has, I figure the query would need to be dynamic SQL and use an 'Execute' statement on the resultant string. I thought I could do it more efficiently.So I added a column to the Combinations table called 'IDStr'. A typical value in this filed might be '67_23' .Then, as long as the options are well ordered (which requires an extra field), one can just concatenate the options together and do a simple lookup in the combinations table for the idStr.Like this... SELECT * FROM Combos WHERE idStr = '67_23'However, it occurs to me that the 'idStr' field stops the table from being in 1st normal form.Any thoughts?I hope this is clear.X. |
|
|
byrmol
Shed Building SQL Farmer
1591 Posts |
Posted - 2004-02-25 : 18:27:39
|
| x,No it is not clear.What is the structure of the Combinations table?Can you give us your current schema (Just the Keys.. CREATE TABLE).It could be just a division query..DavidM"SQL-3 is an abomination.." |
 |
|
|
X-Factor
Constraint Violating Yak Guru
392 Posts |
Posted - 2004-02-25 : 21:37:17
|
| OK, here are the tables and the important fields... Products--------productID pknameAttributes----------attributeID int pkproductID int fkname varcharOptions-------optionID int pkattributeID int fkname varcharOptions_Combinations--------------------optionID int fkcomboID int fkCombinations------------comboID int pkidStr varcharprice moneye.g.products--------1, 'shoe'attributes----------1, 1, 'color'2, 1, 'size'options-------1, 1, 'red'2, 1, 'green'3, 2, '2'4, 2, '3'Options_Combinations--------------------1,13,11,24,22,33,32,44,4 Combinations------------1, '1_3' ,£2 (red, 2)2, '1_4' ,£3 (red, 3)3, '2_3' ,£4 (green, 2)4, '2_4' ,£5 (green, 3)So at the browser, when viewing this shoe, the client has 2 drop downs. One contains colors and the other contains sizes. So when the 'add to cart' button is pressed, you have two option IDs and you need to look up the correct combination to get the right price! |
 |
|
|
byrmol
Shed Building SQL Farmer
1591 Posts |
Posted - 2004-02-25 : 22:13:00
|
X,You are after division...Drop that horrible multivalue column from Combinations, read up on CSV parsing for the IN expression you are going to have to build and take a look at this query..SELECT C.ComboID, MAX(Price) AS PriceFROM Combinations CINNER JOIN Options_Combinations OC ON OC.ComboID = C.ComboIDWHERE OptionID IN (2,3)GROUP BY C.ComboIDHAVING COUNT(*) = (SELECT COUNT(DISTINCT OptionID) FROM Options_Combinations WHERE OptionID IN (2,3)) This will allow you to have more than 2 options per combination if the case arises...If you are only using 2 values then you can OR the IN....and forget the CSV stuff.....SELECT C.ComboID, MAX(Price)FROM Combinations CINNER JOIN Options_Combinations OC ON OC.ComboID = C.ComboIDWHERE OptionID = 2 OR OptionID = 3GROUP BY C.ComboIDHAVING COUNT(*) = 2 And here is a link for Relational Division the SQL way...A Popular post this week...[url]http://www.dbazine.com/celko1.html[/url]DavidM"SQL-3 is an abomination.." |
 |
|
|
X-Factor
Constraint Violating Yak Guru
392 Posts |
Posted - 2004-02-26 : 11:56:19
|
| Thanks for your reply.I think I see how that works.However, given the amount of processing that goes into preparing the CSV - concatenating the values in the app, splitting them apart in the sp & putting them into a table and then the complexity of the final query, do you not think that the 'idStr' field is warranted in this case?Why is it so horrible when its so much more efficient? Couldn't this be a case where denormalization is ok?X. |
 |
|
|
ajthepoolman
Constraint Violating Yak Guru
384 Posts |
Posted - 2004-02-26 : 12:09:21
|
| I would only see the denomalization if you are going to say that a red/small shoe is a different product than a red/large shoe. But I think you are saying that a shoe is a shoe regardless of what color/size it is. Aj |
 |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2004-02-26 : 20:52:34
|
quote: Why is it so horrible when its so much more efficient? Couldn't this be a case where denormalization is ok?
It isn't more efficient, not by a long shot.By combining the attributes into a CSV string, you cannot index them indepdently of one another. If someone is looking for a red shoe, and you have none at all in the inventory, the optimizer would still have to table scan, and compare the CSV for every row to determine that there are none. It's about as slow an operation as you can do in SQL Server without using a cursor. Keeping each attribute on a separate row lets the indexing work more effectively. As a single indexed column, it can instead seek for "red" and return almost immediately that there are none. This applies to any attribute, not just color. It will always be easier to find a single needle laying by itself than one buried in a haystack. |
 |
|
|
X-Factor
Constraint Violating Yak Guru
392 Posts |
Posted - 2004-02-26 : 21:36:43
|
| I don't understand the relavance of a user looking for a red shoe in this context.When I say its more efficient, I mean given a set of option IDs which map to a unique combination, its easier to look up the combination by concatenating the option IDs and running the following...SELECT price FROM Combinations WHERE idStr = '2_3'than concatenating the option ID's, then breaking them apart in a sp and loading them into a table and then running a query like this...SELECT C.ComboID, MAX(Price) AS PriceFROM Combinations CINNER JOIN Options_Combinations OC ON OC.ComboID = C.ComboIDWHERE OptionID IN (2,3)GROUP BY C.ComboIDHAVING COUNT(*) = (SELECT COUNT(DISTINCT OptionID) FROM Options_Combinations WHERE OptionID IN (2,3))If idStr is indexed, then surely the first query is one of the quicker operations done in SQL Server? |
 |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2004-02-27 : 08:20:17
|
| What if the attributes get stored as 3_2? Same shoe, different order of attributes, but now the query returns nothing. You'd have to write logic into your application to ensure this doesn't happen, and it can still happen if updates are made outside of the app.Storing them on separate rows, the second query will always work no matter what order the attributes are entered. That's the beauty of a relational data structure, order does not matter and has no meaning. |
 |
|
|
X-Factor
Constraint Violating Yak Guru
392 Posts |
Posted - 2004-02-27 : 13:48:23
|
| Hi there robvolk, yes I agree.At the moment, I have an 'ordinal' column of type int in the attributes table.Its not that difficult to maintain the idStr column because, in the app, its not possible to change the order of attributes once a product has been created. If you add a new attribute or delete an old one then the current combinations are rendered invalid anyway thus are deleted with a trigger and completely replaced.But next time I'll probably try the proper relational method.Speaking of order significant multivalue columns, in this article http://www.sqlteam.com/item.asp?ItemID=8866 , isn't the 'lineage' column order significant & multivalue? If so, why is it ok in this instance?(I didn't know you were the author of this article when it first occured to me to reference it!)Here's an example of one of the products at the site I'm talking about...http://www.mutr.co.uk/prodDetail.aspx?prodID=531 |
 |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2004-02-28 : 16:05:33
|
Oooooh, colors. While the lineage column is multivalue (sort of; in a way it really isn't) and order is signicant, it actually describes a hierarchy with precedence and dependence. Changing the order of the node values actually changes the hierarchy. It's not just a reordering the same attributes that describe an object. Still, it is a workaround and not necessarily the best way to store that kind of relationship (see below)Whereas in the structure you have, it shouldn't make a difference to say "Green 4mm" or "4mm Green", it's the same combination. But storing them as a single combined attribute no longer makes them equivalent. Each attribute describes a different thing and is independent of the other. If you think about it, ordering attributes in a CSV is akin to numbering the columns of a table, instead of naming them. Moving a column around would require moving all of the data in it to its new column position. For that reason, having a trigger or some other mechanism that changes or maintains an order of attributes should be unnecessary, and IS unnecessary if they are stored separately.There are other ways to model a heirarchy, I used the lineage method because the nested set model that Joe Celko uses was frustrating for the data I had to store. I COULD'VE made it work, but it was much harder. Nested sets are more "relational" in their structure and should be considered for storing hierarchical data. |
 |
|
|
X-Factor
Constraint Violating Yak Guru
392 Posts |
Posted - 2004-03-01 : 11:28:48
|
| Thanks for your reply. I think I see the disinction you're making.However, the trigger I referred to is handy regardless of whether the 'idStr' column exists. For example, if there's a 'big/red' shoe combination and an admin adds a new attribute to the shoe, say, men's or women's, then you can no longer buy a 'big/red' shoe. You can buy a 'big/red/mens' shoe but the point is that the original set of combinations is invalid if you add or subtract a product attribute. This just happens to be convenient if one is maintaining the 'idStr' column because there's no need to update it if a new attribute is added or removed because all the combinations need to be replaced anyway.In fact, I have a suspicion that maybe these product combinations shouldn't be associated with a product using a many-many relationship because the combinations directly derive from the attributes. They are the combinations of the attributes and thus are dependent on them. Do you think there's a better way of modeling this?Also, is there any chance of you writing part 2 of that article? Perhaps you could discuss when to use the method you used instead of the nested set solution?There's another design issue I would welcome some feedback on from anyone who cares to comment....Basically, a product can have different price styles. It can be given a single price (in which case the price is stored in the main product row), be priced by combination, as discussed in this thread, (if there are any combinations for the product) or it can be priced using price bands ( in other words, the price you pay depends on the quantity of the item you are buying).In order to know the price style of a product, there's a field in the main product row called 'PriceStyleID' which is an integer which references a price style. Thus, along with the tables mentioned earlier in this thread, here are some more key tables...Products--------productID int pkPriceStyleID int fkprice moneyname varcharPriceBands----------priceBandID int pkproductID int fklowerBound intprice moneyPriceStyles-----------PriceStyleID int pkname varcharShoppingCart------------cartItemID int pkproductID int fkcomboID int fkquantity intSo, if you have a product in a shopping cart and you want to look up the product's price then one might join the following tables and, based on the value of the 'priceStyleID' field of the product, look up the price in the correct place...priceStyle = 1 - single pricepriceStyle = 2 - priced by price bandspriceStyle = 3 - price by combination.....SELECT case when p.priceStyle = 1 then p.price when p.priceStyle = 2 then pb.price else pC.price endFROM ShoppingCart cINNER JOIN Products p ON p.productID = c.productIDLEFT JOIN ProductCombos pC ON c.comboID = pC.productComboIDLEFT JOIN productPriceBands pb ON pb.productID= p.productID and pb.lowerbound = (SELECT max(pb2.lowerbound) FROM productpricebands pb2 WHERE pb2.productID= pb.productID AND c.quantity >= pb2.lowerbound )WHERE cartItemID = @cartItemID Thus, what I am interested in is the appropriateness of the 'priceStyleID' field in the 'products' table being used to determine where to find the price of a product which is potentially spread across several tables.Is there a better way of doing this?X. |
 |
|
|
|
|
|
|
|