Most efficient way to insert data in fact table

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2005 Forums

Transact-SQL (2005)

Most efficient way to insert data in fact table

Author

Topic

1fred
Posting Yak Master

158 Posts

Posted - 2010-10-13 : 16:34:37

I have a fact table that will hold up to 100 millions of row each year. The information that will be inserted into it can change over time. So the production environment gets new information on some rows. For the moment, we delete the data in the fact that gets updated and insert the new data in the fact table. Some field of the table are being calculated in the extract query using the rowcount() function(number of time in this case). This is a sample of the data of the initial load in the fact:

businessid,Product, process, number of time,execution order,status
1,1,1,1,15,start
2,1,1,2,26,in progress
3,1,1,3,35,end
4,1,2,1,40,start
5,2,2,1,42,start
6,2,2,2,44,start

then we get all the data that was modified or new the next day
businessid,Product, process, number of time,execution order,status
1,1,1,2,17,end
2,1,1,3,27,in progress
4,1,2,1,40,end

Final result in the fact should be this :
1,1,1,1,15,start
1,1,1,2,17,end --new
2,1,1,3,27,in progress--updated
3,1,1,3,35,end
4,1,2,1,40,end --updated
5,2,2,1,42,start
6,2,2,2,44,start

I have 2 problems that they want me to change, or at least they have identified ways to improve perfomance. They want to get rid of the delete and ideally only do insert and want to avoid calculating the execution time on too much data or find the best practice to do it.

In the extraction query at the moment, I need to select the unchanged data in order to calculate the number of time a process for a product has been run...Unless there is an easier way.

Ideally, we should find a logic to only insert in the fact table, without keeping historical data. The fact table is then used to populate a Cube and SSRS reports are using it too. Logic to find the most updated data should be cost efficient.

Some of our clients are using standard edition of SQL 2005 and do not want to get enterprise yet...

So my challenge is to decrease the loading time by a ratio of 4.

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2010-10-17 : 03:37:15

i didnt understand how you're determining new/modified data. your business key is businessid in which case i didnt understand how you insert new records for same businessid?

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources