| Author | Topic | 
                            
                                    | malachi151Posting Yak  Master
 
 
                                        152 Posts | 
                                            
                                            |  Posted - 2012-02-16 : 06:54:18 
 |  
                                            | So I'm looking into the best way to implement replication from our production environment to our internal data warehouse.I've never setup replication before, but for the life of me, having looked into this,the first question that jumps out is, why can't I just create a Job that runs T-SQL MERGE statements using linked servers?I'd want to start with a snapshot of course, but after that we are looking for a nightly update of the warehouse with the transactional data. Also, and this may be a factor as well, the production server is SQL 2005, but the data warehouse server is SQL 2008 R2. |  | 
       
                            
                       
                          
                            
                                    | robvolkMost Valuable Yak
 
 
                                    15732 Posts | 
                                        
                                          |  Posted - 2012-02-16 : 07:56:29 
 |  
                                          | quote:You can.  Be sure to watch the glaciers as they fly by while it runs.  The last things you want to do over linked servers are JOINs, MERGE, or UNION/INTERSECT/EXCEPT.  The underlying architecture of linked servers will convert these operations to cursors of the worst kind.  You would quite literally be better off copying all the production data over to the data warehouse server and processing the MERGE locally.The huge benefit of replication for your scenario is that it will only copy changes.  MERGE would have to evaluate all rows in both sets.If you don't want to deal with replication and its limitations, you can log ship your production database to your data warehouse server in standby mode, and then perform a MERGE.  It's still likely that replication will be more efficient overall, but it's another option.why can't I just create a Job that runs T-SQL MERGE statements using linked servers?
 
 |  
                                          |  |  | 
                            
                       
                          
                            
                                    | malachi151Posting Yak  Master
 
 
                                    152 Posts | 
                                        
                                          |  Posted - 2012-02-16 : 08:03:08 
 |  
                                          | quote:Thanks, this certainly helps get me pointed in the right direction.So essentially, MERGE should only be used between tables on the same instance, got it...Originally posted by robvolk
 
 quote:You can.  Be sure to watch the glaciers as they fly by while it runs.  The last things you want to do over linked servers are JOINs, MERGE, or UNION/INTERSECT/EXCEPT.  The underlying architecture of linked servers will convert these operations to cursors of the worst kind.  You would quite literally be better off copying all the production data over to the data warehouse server and processing the MERGE locally.The huge benefit of replication for your scenario is that it will only copy changes.  MERGE would have to evaluate all rows in both sets.If you don't want to deal with replication and its limitations, you can log ship your production database to your data warehouse server in standby mode, and then perform a MERGE.  It's still likely that replication will be more efficient overall, but it's another option.why can't I just create a Job that runs T-SQL MERGE statements using linked servers?
 
 
 |  
                                          |  |  | 
                            
                       
                          
                            
                                    | robvolkMost Valuable Yak
 
 
                                    15732 Posts | 
                                        
                                          |  Posted - 2012-02-16 : 08:11:55 
 |  
                                          | I should clarify something.  MERGE can be efficient if you have a datetime or identity column that can be used in the WHERE clause to limit the rows being processed.  You'd have to track the last time the warehouse was updated.  If you already have a trigger that updates a ModifiedDate column or similar, that would be the best option.  But if you don't then it's not worth adding just to perform a MERGE. |  
                                          |  |  | 
                            
                       
                          
                            
                                    | malachi151Posting Yak  Master
 
 
                                    152 Posts | 
                                        
                                          |  Posted - 2012-02-16 : 10:52:34 
 |  
                                          | quote:The publishing database is basically insert only, no updates, and the tables all have a sequential primary key, so a MERGE would be pretty simple. All it would have to do is INERT records WHEN NOT MATCHED BY TARGET, that's pretty much it...Originally posted by robvolk
 I should clarify something.  MERGE can be efficient if you have a datetime or identity column that can be used in the WHERE clause to limit the rows being processed.  You'd have to track the last time the warehouse was updated.  If you already have a trigger that updates a ModifiedDate column or similar, that would be the best option.  But if you don't then it's not worth adding just to perform a MERGE.
 
 |  
                                          |  |  | 
                            
                       
                          
                            
                                    | russellPyro-ma-ni-yak
 
 
                                    5072 Posts | 
                                        
                                          |  Posted - 2012-02-16 : 11:11:25 
 |  
                                          | Replication was invented for this and is going to almost certainly be the most efficient way. |  
                                          |  |  | 
                            
                       
                          
                            
                                    | malachi151Posting Yak  Master
 
 
                                    152 Posts | 
                                        
                                          |  Posted - 2012-02-16 : 11:42:21 
 |  
                                          | quote:Yeah, the only complicating factor here is that we have multiple separate databases with identical schema's (each client has their own database), and there is no unique id in the tables across databases, i.e. there are overlapping primary keys.We want to pull the data from each client into a single data warehouse, but in order to do that we will have to append a client id column to the data. I figured this would be easy using T-SQL MERGE, but I'm not sure if this will complicate things using Replication.I know you can replicate from multiple sources into a single source, but how does that work if there are overlapping primary keys?Originally posted by russell
 Replication was invented for this and is going to almost certainly be the most efficient way.
 
 |  
                                          |  |  | 
                            
                       
                          
                            
                                    | russellPyro-ma-ni-yak
 
 
                                    5072 Posts | 
                                        
                                          |  Posted - 2012-02-16 : 14:03:15 
 |  
                                          | You would need to manage the primary keys so they don't collide. Or add a column to each table indicating which server "owns" the data, to "uniqueify" it.But for a data warehouse, I'd say neither replication nor merge are the right solution.  Instead, why not a nightly ETL? |  
                                          |  |  | 
                            
                       
                          
                            
                                    | malachi151Posting Yak  Master
 
 
                                    152 Posts | 
                                        
                                          |  Posted - 2012-02-16 : 16:30:55 
 |  
                                          | quote:Looks like things are turning in the SSIS direction. I've only used SSIS for manually run ETL processes, running them through the BIDS, but I assume running them from SQL Agent isn't a big problem...Originally posted by russell
 You would need to manage the primary keys so they don't collide. Or add a column to each table indicating which server "owns" the data, to "uniqueify" it.But for a data warehouse, I'd say neither replication nor merge are the right solution.  Instead, why not a nightly ETL?
 
 |  
                                          |  |  | 
                            
                       
                          
                            
                                    | russellPyro-ma-ni-yak
 
 
                                    5072 Posts | 
                                        
                                          |  Posted - 2012-02-17 : 07:15:07 
 |  
                                          | Not a big problem at all.  In fact, that's the normal (meaning most common) way they are run. |  
                                          |  |  | 
                            
                            
                                |  |