1fred
Posting Yak Master
158 Posts |
Posted - 2010-10-13 : 16:34:37
|
I have a fact table that will hold up to 100 millions of row each year. The information that will be inserted into it can change over time. So the production environment gets new information on some rows. For the moment, we delete the data in the fact that gets updated and insert the new data in the fact table. Some field of the table are being calculated in the extract query using the rowcount() function(number of time in this case). This is a sample of the data of the initial load in the fact:businessid,Product, process, number of time,execution order,status1,1,1,1,15,start2,1,1,2,26,in progress3,1,1,3,35,end4,1,2,1,40,start5,2,2,1,42,start6,2,2,2,44,startthen we get all the data that was modified or new the next daybusinessid,Product, process, number of time,execution order,status1,1,1,2,17,end2,1,1,3,27,in progress4,1,2,1,40,endFinal result in the fact should be this :1,1,1,1,15,start1,1,1,2,17,end --new2,1,1,3,27,in progress--updated3,1,1,3,35,end4,1,2,1,40,end --updated5,2,2,1,42,start6,2,2,2,44,startI have 2 problems that they want me to change, or at least they have identified ways to improve perfomance. They want to get rid of the delete and ideally only do insert and want to avoid calculating the execution time on too much data or find the best practice to do it.In the extraction query at the moment, I need to select the unchanged data in order to calculate the number of time a process for a product has been run...Unless there is an easier way.Ideally, we should find a logic to only insert in the fact table, without keeping historical data. The fact table is then used to populate a Cube and SSRS reports are using it too. Logic to find the most updated data should be cost efficient.Some of our clients are using standard edition of SQL 2005 and do not want to get enterprise yet...So my challenge is to decrease the loading time by a ratio of 4. |
|