Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 Database Design and Application Architecture
 Any Partitioning gurus?

Author  Topic 

kdd21
Starting Member

11 Posts

Posted - 2010-02-25 : 19:07:08
I'm looking for any opinions on my questions here-- I've also posted this to TechNet. We're trying to deploy a warehouse application in a scenario where we'd like to minimize the impact on the application code to the extent possible (initially, at least), and try to do as much as possible in the configuring of SQL Server...

I'm wondering if SQL Server is capable of a couple of things. I may be confusing "filegroup" with "database" in this post, I'm a little fuzzy about the distinction at the moment. I'm more familiar with Oracle and it's terms "tablespace" and "datafile". If the application presumes its tables are in one "tablespace" in Oracle terms, that would be "filegroup" or "database" in SQL Server? At any rate, I hope my misunderstanding of the details won't confuse the questions too much...

Suppose we have servers A B and C.

Servers A and B are similarly set up-- with SQL Server SE, each with a file group by the name of FINANCIALS let's say, with identical schemas but different data. Let's say A & B are essentially financial data warehouses for separate sales offices. Both have the same table names, file group name and datafile names for the given file group. All the tables within the filegroup have the office name as a field.

What I'd like to do, is to, nightly, one at a time for A & B, take the FINANCIALS filegroup offline (detach), copy the datafiles for the filegroup to server C to a unique location, attach the filegroup there and/or possibly rename the filegroup (if necessary) with the desire to use all of the tables in the newly attached filegroup as staging tables to be switched into ultimately a FINANCIALS filegroup in server C with a matching schema of partitioned tables (partitioned by the Office name field).

The idea being, that servers A and B will remain the individual warehouses for each office, and can be loaded from the office servers in parallel with each other. Once they are loaded, their datafiles are transfered to server C to be assembled into a composite partitioned-table database of the combined data of both A and B.

So the questions are:

1- is this practical? Problems with filegroup or datafile names in the process (collisions, etc.)? All three servers have an application running on them that is expecting the "FINANCIALS" filegroup to exist with the standardized schema. Renaming tables, datafiles, etc. are not a problem, but can you "switch" a table from one filegroup into a table partition for a table that's in another filegroup? Or if not, can you attach datafiles to an existing filegroup (presumably after renaming the TABLES to avoid collisions?).

2- Is there a better way to accomplish this? It's important for performance reasons that the C database be combined and perferrably not over network connections-- linked servers are problematic because of the network connection-- and offloading the workload of C from servers A and B is desirable as well. It's also important that we minimize the impact on the application which has assumptions about the current schema layout and filegroup-- it's not designed to query its tables out of multiple filegroups at the moment, and I don't know what the impact would be if every table had to be qualified by its filegroup-- so that's something we'd like to avoid if possible.

3- It's also important that this solution be somewhat scaleable, in the real world environment we won't have just A & B, but 20 to 30 independent warehouse servers where we'd like to nightly aggregate the data to the "enterprise" server. Speed of transfer of the data is an issue, but distributed query solutions are a worse place to put the network connection. Part of the point of this solution is that the nightly import into A and B are time consuming and need to be done in parallel with each other, there's a scrub, BCP and denormalize scripts steps that we'd like to avoid doing on C, and also avoid export/import of the data from A and B to C as well if the datafiles can simply be copied and switched into partitioned tables...

Also, I presume we would be using SQL Server 2008 R2 Enterprise for the C server, but would this work if A and B were SQL Server 2005 SE? Or if there was a mix of versions? That's simply a question about the ability to attach datafiles across versions, which I would think is possible but am not sure.

Any thoughs would be appreciated...


--

Zync


--

Sync
   

- Advertisement -