Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
rquigley1
Starting Member
6 Posts |
Posted - 2004-12-02 : 11:21:25
|
| I am building a system in which we are considering using merge replication to replicate data. I need to replicate several client databases into one large database. The problem is comming up with unique keys. Merge replication automaticlly inserts uniqueidentifiers. If i build all the client databases with uniqueidentifiers then I'm adding 12bytes of data for every row in in every table that must be replicated! Is there a better way to do this? I considered slicing up integer values and having each client database pertain to a particular range set and using transact replication. The problem here is coming up with an acurate range set. Adding additional clients means adding additional databases and more range sets for primary key values. Using uniqueidentifiers would solve the headaque of maintaing these values and removing the possiblity of running out of numbers. Will indexes using uniqueidentifiers slow down queries greatly? We are looking at a lot of small databases all using unique identifiers. Is there another way to do this avoiding uniqueidentifiers and maintaing relationships when data is replicated to the all clients database. |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2004-12-02 : 11:40:08
|
| Are you sure merge replication is the right way to go for this scenario? Regardless of the mechanics, are you sure that snapshot or transactional replication cannot provide the features you need? Indeed, what features do you need? Listing those first instead of jumping into the uniqueidentifier question might suggest another solution. I'm not a replication wiz but I cannot find anything in BOL that says merge replication REQUIRES uniqueidentifiers. I'm pretty sure it's been done without using them. |
 |
|
|
rquigley1
Starting Member
6 Posts |
Posted - 2004-12-02 : 15:56:17
|
| No I am not sure that merge replicatin is the right way to go. In fact I would rather use transact replication or some other method. "If the table already contains a column with the ROWGUIDCOL property that has a unique index or primary key constraint, SQL Server will use that column automatically as the row identifier for the publishing table. Otherwise, SQL Server adds a uniqueidentifier column, titled rowguid, which has the ROWGUIDCOL property and an index, to the publishing table. Adding the rowguid column increases the size the publishing table. The rowguid column and the index are added to the publishing table the first time the Snapshot Agent executes for the publication."It was my understanding that merge replication must maintain GUID's. Isn't that what this means? The table being replicated must have a column with the ROWGUIDCOL property in the publisher, and if it doesn't it creates one so it can replicated changes in either direction. |
 |
|
|
rquigley1
Starting Member
6 Posts |
Posted - 2004-12-02 : 16:08:46
|
| The real problem is mainting relationships and referential integrity in the replicated db. The system would have n client databases, and one master database that contains meta data about the clients and databases. For read only comparison purposes we would like to replicate the data all into one database. However, how do you get over the problem that all client db's will have an employee table with non unique primary keys? Is the best way to carve out a slice of the int data type? Is it a better idea to create a trigger that assigns unique values? For instance, a table in the meta database could soter the last used employee number and increment each new primary key upon insert no matter which client db is being used. Can you create sucha trigger that references a table in another database? Can anyone please help me out? Am I going about this the wrong way. I have considered pretty much every option I can think of including dts. |
 |
|
|
monkeybite
Posting Yak Master
152 Posts |
Posted - 2004-12-02 : 16:50:02
|
| I've done exactly what you're trying to do, consolidate several disconnected databases (12 total, I think) into a single central db using merge replication. The central publishing DB also was subscribing to another published database. If you have many tables that need to be merged, then replication is probably the easiest way to go, especially if you're using SQL 2000.I had several tables that used integer primary keys generated by IDENTITY. When you set up your publication and select articles (tables) to publish, you can specify what numeric range that IDENTITY will use on subscriber databases. When the initial snapshot is applied to subscriber databases, the publisher (your central db) manages all of the identity ranges for these tables, and makes sure that subscribers get assigned a unique range. You as the DBA specify how many numbers are in the range, and the publisher determines where the starting point is for each subscriber.For example, you can specify that a range for a particular article is 10,000. The publisher will assign Subscriber A 10,000 to 19,999, Subscriber B 20,0000 to 29,999, Subscriber C 30,000 to 39,000 and so on.There's also a threshhold value, so when subscriber A starts to get close to 19,999, it'll assign a new range, for example 60,000 to 69,999.Look at the topic "Managing Identity Values" in BOL.And yes every table that is published for merge replication will get a uniqueidentifier column if you don't already have one marked as ROWGUID.If you are using SQL 7.0 to replicate, um, this will be much more complicated.Hope this helps.~ monkey |
 |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2004-12-02 : 16:58:25
|
| Crap. I was thinking of replication under SQL 6.5, and merge wasn't available then.As an alternative, if all you're wanting to do is import the client data into a central database for reporting purposes, and none of the clients needs to modify another client's data, then you don't need replication at all. Nor would you need to maintain referential integrity in the central database; the client databases can do that.What I'd suggest is modifying the tables of the centralized DB to add a ClientCode column. This would be unique for each client. You'd also modify the primary key of each table so that it includes the ClientCode.Once that's done, you can set up an import/export routine so that each client can send their data to you. You may want to have a separate staging database for the sole purpose of loading this data. This database would have the exact structure of a client database, but you'd always truncate all tables before a new load.Once you get the client's data, import it into the staging DB. Once that's done and the data is verified, you would delete that client's data from the central reporting database, then import the staging data into it. You'd have to construct this piece to include the ClientCode for each client, but it's basically just an INSERT...SELECT. You'd repeat this process for each client as you receive their data.The downside to this is that it would involve transmitting more data than replication would. You'd also be sending unchanged data each time. But, you avoid all of the hassle and limitations of replication and you can guarantee a clean and complete set of data from each client. Plus you'd never have a merge conflict and don't have to set up identity ranges or GUIDs. And once you get the routines set up, you can automate the process via scheduled jobs. You can use DTS for the data transfer, but you don't have to, and I think DTS will make the process more cumbersome. bcp and BULK INSERT would do the trick nicely.Let me know if this sounds like a viable option or not. I admit it sounds complicated, but it's not all that difficult to implement. I've done it on a limited scale and it worked pretty well. |
 |
|
|
monkeybite
Posting Yak Master
152 Posts |
Posted - 2004-12-02 : 17:16:43
|
| rob, rquigley1only if the data to be merged is relatively simple would I go this route. believe me my experiences with replication on SQL 65 (awful!) and 70 (not-so-awful) initially turned me off to trying any repl in SQL 2000. however in my opinion this was one of the most improved areas over the previous versions.if you do have SQL 2000 honestly i think it'll be a push as far as time-to-implement with either using mergerepl or creating your own simple replication.in this case, i would use merge if changes can occur at subscriber(s) and publisher, or if changes can be occur frequently anywhere in the topology.i would use snapshot in this case iff changes occur only at publisher and they are relatively infrequent. snapshot basically blows away a subscriber's table and replaces it with a copy from the publisher. no good if changes need to go both ways.merge conflicts only occur if subscribers / publishers can act upon the exact same data row. if you're merging new and updated records from various databases and they are unique to each database (i.e. have a location specified in the record or the data is properly partitioned horizontally), you'll never have a conflict.~ monkey |
 |
|
|
rquigley1
Starting Member
6 Posts |
Posted - 2004-12-06 : 14:32:05
|
| Thank you for your replies.I am using sql server 2000. I looked at sql servers method for managing identity values and that is a possibility. I am looking at anothe solution though. One of the major problems is that there is no set number of publishers. The topology will conist of many publishers being added at lets say any time. The other end is simple though. There is only one subscriber. In order to manage primary key values in articles that must be replicated i was thinkging of using insert triggers in all of the client db's that call a stored procedure which maintains a key for each table in the database. AFter returning the available key the to the trigger each the value will be incremented. The trigger will then use this key to insert the final row in the table. This method will make replication very simple because it plans for the keys ahead of the time. Keep in mind that there is no data right now. This is for a project that will be implemented in the near future. Do you see any shortcoming to this method. For instance, when do I have to worry about the trigger not fireing. What is the best way to avoid duplicate keys because there are many identicle triggers accessing the same information from many client db's? Do you understand my solution? |
 |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2004-12-06 : 14:58:01
|
| Well, insert triggers fire AFTER the row has already been inserted, unless you write an INSTEAD OF trigger to do it. Having stored procedures handle the insert is a good way to go, but the way you're describing it doesn't sound right. I have a feeling that putting replication on top of it would either be totally redundant, or could interfere with the replication process.Why not just use the GUID as your keys and drop identity altogether? You're gonna need them for merge replication anyway, and they'll make identities redundant. Disk space is cheap, and 12 bytes isn't much compared to the rest of the data in the row. |
 |
|
|
|
|
|
|
|