Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 Import/Export (DTS) and Replication (2000)
 Data Warehouse

Author  Topic 

makimark
Starting Member

34 Posts

Posted - 2003-03-20 : 02:40:52
Hi
We are having to import (db to db) data from a live system to fact tables in a data warehouse. The amount of data is vast, 1 million records + at a time.
The questions is about efficiency - would it be better to truncate the fact tables first and copy in the new data, or to do a compare from a temp table and only copy in the new or changed data. The data is processed into cubes afterwards.

thanks

Mark

chadmat
The Chadinator

1974 Posts

Posted - 2003-03-20 : 11:54:36
Neither. Add a timestamp to the data, then only import data that has a timestamp later than the last import.

-Chad

Go to Top of Page

lee_h
Starting Member

36 Posts

Posted - 2003-03-20 : 22:27:35
could you expand on this please, I've never used the timestamp datatype before so don't know how it works.

if I have the following 2 tables

create table t_load (t_key integer, t_data varchar(5), t_stamp timestamp)

create table t_fact (t_key integer, t_data varchar(5), t_stamp timestamp)


On day 1, I load the following data into t_load (from a text file)

1, "aaa"
2, "bbb"


I then execute a query that inserts the new data into t_fact

On day 2, I truncate t_load and load the new text file which now contains the following data

1, "aaa"
2, "bbb"
3, "ccc"

I then execute the query that inserts the new data (3, "ccc"). What should this query look like? Or have I missed something in the table creation?


Go to Top of Page

byrmol
Shed Building SQL Farmer

1591 Posts

Posted - 2003-03-20 : 23:43:00
Why would you not just load the text file into the t_fact as well. Nothing is faster for moving data than BCP...

DavidM

"SQL-3 is an abomination.."
Go to Top of Page

lee_h
Starting Member

36 Posts

Posted - 2003-03-20 : 23:48:37
because I only want to add the new row (3, "ccc") into the fact table.
Day 3 may only have the following data in it

3, "ccc"
4, "ddd"
5, "eee"


My fact table at the end of day 3 should have (and only have) 5 rows in it.

I use bcp to pull in the text file.


Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2003-03-20 : 23:58:35
You can do that in one of two ways:

INSERT INTO t_fact
SELECT L.* FROM t_load L LEFT JOIN t_fact F ON L.t_key=F.t_key
WHERE F.t_key IS NULL


...or...

DELETE L FROM t_load L INNER JOIN t_fact F ON L.t_key=F.t_key
INSERT INTO t_fact SELECT * FROM t_load


Go to Top of Page

lee_h
Starting Member

36 Posts

Posted - 2003-03-21 : 00:09:00
Cheers robvolk, that's the way it works at the moment. The thing that prompted me was chadmat's reply to the original question.

Go to Top of Page

byrmol
Shed Building SQL Farmer

1591 Posts

Posted - 2003-03-21 : 02:18:15
Lee,

Your inital question was about efficency. It depends on the number of "new" rows that have to be inserted.

If there a millions of "new" rows then in my experience BCP OUT then BCP IN outperforms a SELECT..INSERT statement every time...

1) Truncate t_Load
2) BCP into t_Load from the text file
3) BCP out the "new" rows using Rob's SELECT query into a text file
4) BCP IN this "new" text file to t_fact

I recently had to de-sensitise a 15 million row Client table on a copy of our production DB. The requirement was easy enough, I simple had to change the Client's name to all X's..

Original I thought I'll just run an UPDATE statement.... So I let the query rip and went to lunch ... I came back in an hour and half later (Two for Tuesday at the local) and the damn thing is still going...

I cancelled the query and that took another 30 minutes.. So I drop FK's, BCP OUT the table (3 minutes), truncate table (3 seconds), and then BCP IN (6 minutes), then reapply the FK's.. All up about 15 minutes.


DavidM

"SQL-3 is an abomination.."
Go to Top of Page

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2003-03-21 : 08:56:52
David -- I hope the client names weren't the primary key of that table or foreign keys in others !!



- Jeff
Go to Top of Page
   

- Advertisement -