Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 Other SQL Server 2008 Topics
 Challenge tuning query on large tables

Author  Topic 

Sport2000
Starting Member

2 Posts

Posted - 2011-09-27 : 10:44:50
Hi,

Here's the structure of the db I'm using:
* Table frs_ArticlesBDFournisseur which contains article main data.One row per article/color. Key is Refrefart (int). Size is over 220000 rows at the momemt. Only index is the PK (clusterred).
* Table frs_ArticlesBDFournisseurTaille contains all the various sizes and barcode for the corresponding article/color from the previous table. Key is RefREFARTTAILLE (int). Size is over 1200000 rows at the moment. refrefart column also exists in this table to link with the previous table, altough there is no physical constraint linking the 2 tables. Indexes are PK (clustered), refrefart (non clustered) and refrefart,taille,codeean,codeeanfrs (non clustered).

Here's what I want to do:
I work at headquarters of franchise shops. I have to collect data from each selling point, match it with our own database (the one discribed above) to normalise the incoming data and complete it with more details. The best way to match an article is to use its barcode (when it exits in the shop data). We pack up to 2 barcodes with every article (callled CodeEan and CodeEanFrs from frs_ArticlesBDFournisseurTaille). I need to complete the information I have with the article size (called taille from frs_ArticlesBDFournisseurTaille), the article's season (called Saison from frs_ArticlesBDFournisseur) and the number of different season the article has been existing in (to know whether it is a one season product or a permanent product).

Here's the challenge for the query at hand:
There is no constant data in frs_ArticlesBDFournisseurTaille with which I can do a GROUP BY.
I need to get the latest product data (in case there is more than one season) and the total number of seasons.

Here's the query I've designed:
SELECT M.*, ART_REF, A.ART_PSEUDO, ART_NOM, ART_FEDAS, ART_ANNEECREATION, COU_CODE, COU_NOM, TGF_NOM, MRK_IDREF, MRK_NOM, MRK_NOM_ORI, CBI_EAN, CBI_EANFRS, B.RefRefArt, B.Taille AS sp2kTaille, CASE when B.CodeEan = '' then null else B.CodeEan END AS sp2kEan, CASE when B.CodeEanFrs = '' then null else B.CodeEanFrs END AS sp2kEanFrs, B.Saison, Coalesce(B.Pos, 0) as Permanent
FROM gin_Mouvements M
LEFT JOIN gin_Articles A ON M.MOV_ARTID=A.ART_ID
LEFT JOIN gin_Couleurs C ON M.MOV_COUID=C.COU_ID
LEFT JOIN gin_Tailles T ON M.MOV_TGFID=T.TGF_ID
LEFT JOIN gin_Marques N ON A.ART_MRKID=N.MRK_ID
LEFT JOIN gin_Ean E ON (M.MOV_ARTID=E.CBI_ARTID AND M.MOV_COUID=E.CBI_COUID AND M.MOV_TGFID=E.CBI_TGFID)
OUTER APPLY (SELECT TOP 1 T.RefRefArt, CodeEan, CodeEANFrs, Taille, Art.Saison, Rank() over (order by T.RefREFART ASC) as Pos
FROM frs_ArticlesBDFournisseurTaille T
LEFT JOIN frs_ArticlesBDFournisseur Art ON T.RefREFART=Art.RefREFART
WHERE (CodeEAN = E.CBI_EAN OR CodeEANFrs = CBI_EANFRS
OR CodeEANFrs = CBI_EAN OR CodeEAN = CBI_EANFRS)
ORDER BY RefREFART DESC) B

Query explained:
All the first part, with all the left join, is to gather the data, relatively small, from the shops. There is no performance issue here.
I've used an outer apply to avoid having to do a left join and the ON conditions which I couldn't get to work to return at the same time the max(refrefart) and the total number of season; remember I can not make a group by.

Performances:
This query runs at a 20row/s at best (nearly 20s to return 500 rows). The data in both tables described at the beginning is bound to increase twice a year by about 10000 new products and 6 times more for the barcodes. I'm afraid the performances will be degrading even more soon.

Advice ?:
If anyone can see how to improve the query or rewrite the query differently or improve the db or else ... please let me know.

Stats:
Execution plan sum up: it is spending 57% on Nested Loops (Left Outer Join) and 40% on Index Scan (NonClustered)[sp2k_staging].[dbo].[frs_ArticlesBDFournisseurTaille].[art_ean] [T]. They are thick lines but they are mostly to the right of the diagram.
The client statistics based on a TOP 500 run of the query:
Client Execution Time 10:45:36
Query Profile Statistics
Number of INSERT, DELETE and UPDATE statements 0 0.0000
Rows affected by INSERT, DELETE, or UPDATE statements 0 0.0000
Number of SELECT statements 2 2.0000
Rows returned by SELECT statements 501 501.0000
Number of transactions 0 0.0000
Network Statistics
Number of server roundtrips 3 3.0000
TDS packets sent from client 3 3.0000
TDS packets received from server 119 119.0000
Bytes sent from client 2452 2452.0000
Bytes received from server 478675 478675.0000
Time Statistics
Client processing time 16895 16895.0000
Total execution time 18158 18158.0000
Wait time on server replies 1263 1263.0000

Thank you.

Sport2000
Starting Member

2 Posts

Posted - 2011-09-28 : 04:22:44
Hi,
I've split into 2 indexes codeean INCLUDE (refrefart,taille) and codeeanfrs INCLUDE (refrefart,taille).
It's working like wonders
Cheers
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2011-09-28 : 05:24:10
We make a large ecommerce product. We have huge optional variability - client can decide to individually position products within a group, differently for each Brand, or let them follow the product's default sequencing; Client can override the Product Name with a product-specific Meta Title, per Brand / Season / Whim!, or just let the system use the Product Name; they might use the standard product imagery, or override that on the basis of X, Y or Z ... and so on.

We used to use a VIEW which LEFT JOIN'd all the tables together. It was fast enough, but was still a drain on CPU when the site was busy.

So this year we have warehoused the data. We have an Sproc that will "freshen" the data in the cache table for a specific product, and we have Triggers on the various tables that fire the Sproc if something substantive changes. We run the Sproc on all products every 30 minutes and raise an alert if it finds any products to modify (i.e. some modification process has failed to forcfe the product to refresh in the cache table).

So instead of the VIEW we now have a single cache table and no outer joins. Performance is very good

So ... rather than trying to tune your query (as we did year-on-year with each new release, and as the size of the product table grew each Season) you might want to go straight to a cached "flat" table.

Go to Top of Page
   

- Advertisement -