| Author |
Topic |
|
makimark
Starting Member
34 Posts |
Posted - 2003-03-20 : 02:40:52
|
| HiWe are having to import (db to db) data from a live system to fact tables in a data warehouse. The amount of data is vast, 1 million records + at a time.The questions is about efficiency - would it be better to truncate the fact tables first and copy in the new data, or to do a compare from a temp table and only copy in the new or changed data. The data is processed into cubes afterwards.thanksMark |
|
|
chadmat
The Chadinator
1974 Posts |
Posted - 2003-03-20 : 11:54:36
|
| Neither. Add a timestamp to the data, then only import data that has a timestamp later than the last import.-Chad |
 |
|
|
lee_h
Starting Member
36 Posts |
Posted - 2003-03-20 : 22:27:35
|
| could you expand on this please, I've never used the timestamp datatype before so don't know how it works.if I have the following 2 tablescreate table t_load (t_key integer, t_data varchar(5), t_stamp timestamp)create table t_fact (t_key integer, t_data varchar(5), t_stamp timestamp)On day 1, I load the following data into t_load (from a text file)1, "aaa"2, "bbb"I then execute a query that inserts the new data into t_factOn day 2, I truncate t_load and load the new text file which now contains the following data1, "aaa"2, "bbb"3, "ccc"I then execute the query that inserts the new data (3, "ccc"). What should this query look like? Or have I missed something in the table creation? |
 |
|
|
byrmol
Shed Building SQL Farmer
1591 Posts |
Posted - 2003-03-20 : 23:43:00
|
| Why would you not just load the text file into the t_fact as well. Nothing is faster for moving data than BCP...DavidM"SQL-3 is an abomination.." |
 |
|
|
lee_h
Starting Member
36 Posts |
Posted - 2003-03-20 : 23:48:37
|
| because I only want to add the new row (3, "ccc") into the fact table.Day 3 may only have the following data in it3, "ccc"4, "ddd"5, "eee"My fact table at the end of day 3 should have (and only have) 5 rows in it.I use bcp to pull in the text file. |
 |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2003-03-20 : 23:58:35
|
| You can do that in one of two ways:INSERT INTO t_factSELECT L.* FROM t_load L LEFT JOIN t_fact F ON L.t_key=F.t_keyWHERE F.t_key IS NULL...or...DELETE L FROM t_load L INNER JOIN t_fact F ON L.t_key=F.t_keyINSERT INTO t_fact SELECT * FROM t_load |
 |
|
|
lee_h
Starting Member
36 Posts |
Posted - 2003-03-21 : 00:09:00
|
| Cheers robvolk, that's the way it works at the moment. The thing that prompted me was chadmat's reply to the original question. |
 |
|
|
byrmol
Shed Building SQL Farmer
1591 Posts |
Posted - 2003-03-21 : 02:18:15
|
| Lee,Your inital question was about efficency. It depends on the number of "new" rows that have to be inserted. If there a millions of "new" rows then in my experience BCP OUT then BCP IN outperforms a SELECT..INSERT statement every time...1) Truncate t_Load2) BCP into t_Load from the text file3) BCP out the "new" rows using Rob's SELECT query into a text file4) BCP IN this "new" text file to t_factI recently had to de-sensitise a 15 million row Client table on a copy of our production DB. The requirement was easy enough, I simple had to change the Client's name to all X's.. Original I thought I'll just run an UPDATE statement.... So I let the query rip and went to lunch ... I came back in an hour and half later (Two for Tuesday at the local) and the damn thing is still going... I cancelled the query and that took another 30 minutes.. So I drop FK's, BCP OUT the table (3 minutes), truncate table (3 seconds), and then BCP IN (6 minutes), then reapply the FK's.. All up about 15 minutes. DavidM"SQL-3 is an abomination.." |
 |
|
|
jsmith8858
Dr. Cross Join
7423 Posts |
Posted - 2003-03-21 : 08:56:52
|
David -- I hope the client names weren't the primary key of that table or foreign keys in others !! - Jeff |
 |
|
|
|