Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
Sport2000
Starting Member
2 Posts |
Posted - 2011-09-27 : 10:44:50
|
Hi,Here's the structure of the db I'm using:* Table frs_ArticlesBDFournisseur which contains article main data.One row per article/color. Key is Refrefart (int). Size is over 220000 rows at the momemt. Only index is the PK (clusterred).* Table frs_ArticlesBDFournisseurTaille contains all the various sizes and barcode for the corresponding article/color from the previous table. Key is RefREFARTTAILLE (int). Size is over 1200000 rows at the moment. refrefart column also exists in this table to link with the previous table, altough there is no physical constraint linking the 2 tables. Indexes are PK (clustered), refrefart (non clustered) and refrefart,taille,codeean,codeeanfrs (non clustered).Here's what I want to do:I work at headquarters of franchise shops. I have to collect data from each selling point, match it with our own database (the one discribed above) to normalise the incoming data and complete it with more details. The best way to match an article is to use its barcode (when it exits in the shop data). We pack up to 2 barcodes with every article (callled CodeEan and CodeEanFrs from frs_ArticlesBDFournisseurTaille). I need to complete the information I have with the article size (called taille from frs_ArticlesBDFournisseurTaille), the article's season (called Saison from frs_ArticlesBDFournisseur) and the number of different season the article has been existing in (to know whether it is a one season product or a permanent product).Here's the challenge for the query at hand:There is no constant data in frs_ArticlesBDFournisseurTaille with which I can do a GROUP BY.I need to get the latest product data (in case there is more than one season) and the total number of seasons.Here's the query I've designed:SELECT M.*, ART_REF, A.ART_PSEUDO, ART_NOM, ART_FEDAS, ART_ANNEECREATION, COU_CODE, COU_NOM, TGF_NOM, MRK_IDREF, MRK_NOM, MRK_NOM_ORI, CBI_EAN, CBI_EANFRS, B.RefRefArt, B.Taille AS sp2kTaille, CASE when B.CodeEan = '' then null else B.CodeEan END AS sp2kEan, CASE when B.CodeEanFrs = '' then null else B.CodeEanFrs END AS sp2kEanFrs, B.Saison, Coalesce(B.Pos, 0) as PermanentFROM gin_Mouvements MLEFT JOIN gin_Articles A ON M.MOV_ARTID=A.ART_IDLEFT JOIN gin_Couleurs C ON M.MOV_COUID=C.COU_IDLEFT JOIN gin_Tailles T ON M.MOV_TGFID=T.TGF_IDLEFT JOIN gin_Marques N ON A.ART_MRKID=N.MRK_IDLEFT JOIN gin_Ean E ON (M.MOV_ARTID=E.CBI_ARTID AND M.MOV_COUID=E.CBI_COUID AND M.MOV_TGFID=E.CBI_TGFID)OUTER APPLY (SELECT TOP 1 T.RefRefArt, CodeEan, CodeEANFrs, Taille, Art.Saison, Rank() over (order by T.RefREFART ASC) as Pos FROM frs_ArticlesBDFournisseurTaille T LEFT JOIN frs_ArticlesBDFournisseur Art ON T.RefREFART=Art.RefREFART WHERE (CodeEAN = E.CBI_EAN OR CodeEANFrs = CBI_EANFRS OR CodeEANFrs = CBI_EAN OR CodeEAN = CBI_EANFRS) ORDER BY RefREFART DESC) BQuery explained:All the first part, with all the left join, is to gather the data, relatively small, from the shops. There is no performance issue here.I've used an outer apply to avoid having to do a left join and the ON conditions which I couldn't get to work to return at the same time the max(refrefart) and the total number of season; remember I can not make a group by.Performances:This query runs at a 20row/s at best (nearly 20s to return 500 rows). The data in both tables described at the beginning is bound to increase twice a year by about 10000 new products and 6 times more for the barcodes. I'm afraid the performances will be degrading even more soon.Advice ?:If anyone can see how to improve the query or rewrite the query differently or improve the db or else ... please let me know.Stats:Execution plan sum up: it is spending 57% on Nested Loops (Left Outer Join) and 40% on Index Scan (NonClustered)[sp2k_staging].[dbo].[frs_ArticlesBDFournisseurTaille].[art_ean] [T]. They are thick lines but they are mostly to the right of the diagram.The client statistics based on a TOP 500 run of the query:Client Execution Time 10:45:36 Query Profile Statistics Number of INSERT, DELETE and UPDATE statements 0 0.0000 Rows affected by INSERT, DELETE, or UPDATE statements 0 0.0000 Number of SELECT statements 2 2.0000 Rows returned by SELECT statements 501 501.0000 Number of transactions 0 0.0000Network Statistics Number of server roundtrips 3 3.0000 TDS packets sent from client 3 3.0000 TDS packets received from server 119 119.0000 Bytes sent from client 2452 2452.0000 Bytes received from server 478675 478675.0000Time Statistics Client processing time 16895 16895.0000 Total execution time 18158 18158.0000 Wait time on server replies 1263 1263.0000Thank you. |
|
Sport2000
Starting Member
2 Posts |
Posted - 2011-09-28 : 04:22:44
|
Hi,I've split into 2 indexes codeean INCLUDE (refrefart,taille) and codeeanfrs INCLUDE (refrefart,taille). It's working like wondersCheers |
|
|
Kristen
Test
22859 Posts |
Posted - 2011-09-28 : 05:24:10
|
We make a large ecommerce product. We have huge optional variability - client can decide to individually position products within a group, differently for each Brand, or let them follow the product's default sequencing; Client can override the Product Name with a product-specific Meta Title, per Brand / Season / Whim!, or just let the system use the Product Name; they might use the standard product imagery, or override that on the basis of X, Y or Z ... and so on.We used to use a VIEW which LEFT JOIN'd all the tables together. It was fast enough, but was still a drain on CPU when the site was busy.So this year we have warehoused the data. We have an Sproc that will "freshen" the data in the cache table for a specific product, and we have Triggers on the various tables that fire the Sproc if something substantive changes. We run the Sproc on all products every 30 minutes and raise an alert if it finds any products to modify (i.e. some modification process has failed to forcfe the product to refresh in the cache table).So instead of the VIEW we now have a single cache table and no outer joins. Performance is very good So ... rather than trying to tune your query (as we did year-on-year with each new release, and as the size of the product table grew each Season) you might want to go straight to a cached "flat" table. |
|
|
|
|
|
|
|