cross database joins. Bad? - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

SQL Server Development (2000)

cross database joins. Bad?

Author

Topic

chadbryant5
Starting Member

32 Posts

Posted - 2005-04-19 : 16:13:24

We are considering putting a portion of our database tables into their own database for various reasons. Our manager likes this approach, but we wanted to check on possible performance issues with joining tables from different databases. These would be on the same local SQL Server, but just isolated into their own databases.

We can and will run some tests on this to see if there would be any real issues in doing it this way, but I wanted to get some feedback to see if others out there had any experience with this and if there would be any real performance issues in doing the cross database joins.

Thanks in advance.

Chad

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

7020 Posts

Posted - 2005-04-19 : 16:54:35

The real issues are not performance related.

One issue is database object security. If you are using an application role, you will not be able to access the other databases on the same connection. Also, you will have to grant users direct access to tables and views that are accessed by stored procs in other databases. Management of user and object security will become a real chore.

A big disadvantage is that you will not be able to create foreign key constraints between tables in different databases.

Another disadvantage is that in the event of a database recovery, it will be very difficult to recover all the tables to the same point in time, and you could end up with partial or missing transactions. Bacically, your application will fail the ACID test.

I would say that the disadvantages are big enough that you should have a really compelling reason that cannot be solved any other way; more than just "Our manager likes this approach". Can you explain the reasons why you are considering this?

CODO ERGO SUM

jen
Master Smack Fu Yak Hacker

4110 Posts

Posted - 2005-04-20 : 04:58:16

we're implementing this type of approach for one of our reporting system.

All sites dump their data into this server and the system picks up the data from different databases (with cross database joins)

we've also employed application roles, no direct table queries, no other user can access the server. It's a pretty big system which merits it to have a dedicated server

placing the objects in one database was a difficult one to manage (the files keep growing larger and larger and backup was difficult to manage due to it's size)

IMHO, use this approach with caution. As MVJ mentioned, the real threat is managing security

--------------------
keeping it simple...

chadbryant5
Starting Member

32 Posts

Posted - 2005-04-20 : 09:29:59

Thanks for the replies. I will have to look into the security issue more. I'm a C# developer, and I know enough about SQL Server (to be dangerous..smile), but I'll have to delve a little further into this one.

The other issues are not so much an issue as our app is like Jen's system ( a reporting system) which is accessed via the web and the data access layer in our C# code will access it via a domain account that has access to the databases.

I think the other database we want to put our data in that will be the main driver for our reports will be read-only. This is not an OLTP app, so some of the issues don't really apply I don't think.

Further, the data in the "supplementary database" will be kind of like snapshot data that can be rebuilt at will inhouse from another warehouse database, so the data is not important. It is important, but its just snapshot data isolated into its own little world and can be rebuilt if need be. If it got corrupted or something else happened, the worse case scenario is users can't access their reports for a short period while data is rebuilt.

I'll have to delve further to see what driving reasons are behind the separate database. All I know now is that my manager likes the isolation. We'll have 3 databases (if we go this route) and the first is used by another inhouse app to process and build data in this database. The other two are exclusive to our web app. I understand his desire to not have the internal app doing anything in the app that our users of the web app are using. One is a warehouse (data warehouse) and the other is data that supports the web app that uses the warehouse data.

If you could, would you elaborate a bit on the security issues? This is the biggest concern I see. Performance is a major concern as this is a reporting app that has some unusual performance requirements, but security is also important because the data is sensitive (healthcare related).

Here is basically the setup. We are looking at having 3 databases:

1. A warehouse database which has all the data that the web app will use although the web app will not report directly off this data. The users of the app will "define" containers for reports that will go get the data for their "container". Their container will contain the criteria for what data to go get (which hospitals, what date range, etc). Then they'll work with this container creating various reports within the container. The data comes in from various points and gets massaged and ends up in this warehouse in a form that the web app can use. Once the container is built (not a frequent thing to build) the web app will access this database and put the data into another "snapshot" database specific to the container being built. The snapshot database will have tables built on the fly that has data specific to that container so reports are faster rather than data being accessed all from the same HUGE tables. Contention is one issue we're trying to avoid here.

2. The web app database. This has all the user/profile info for the web app, the container database, reports within those containers and the definitions for those reports, licensing info to define what data in the warehouse can be seen for a given user to use for a container, etc.

3. The snapshot database. This is the data built from the warehouse data for each container. It will get rebuilt if the container definition changes. If something happens to this data, it can be easily rebuilt for the containers.

That's pretty much it. I hope that makes sense. I'm a little out of my league in the database arena, but learning fast. Any suggestions/cautions would be appreciated. I was primarily interested in the performance aspect of all of this because reports can be built to work with millions + rows and alot goes on in calculations that can't be done ahead of time for statistical reporting reasons. The reports can't take all day to run either, so this is a constant challenge for us and when my manager expressed his interest in keeping all this data in 3 separate databases, joining across databases raised a performance caution flag in my mind. Now I'm a little concerned about the security implications, although since this is a web application, I'm not sure that we'd have the same types of security issues if the users were internal and accessing it with their own individual windows domain accounts. All web users will login to the database and the security will be largely handled in C# code as to what data they can see and work with. They'll all be accessing the data via 1 common windows domain account....IOW the ADO.NET SQL Server client connection string will not use a specific SQL Server and it will not connect to SQL Server in the context of the user's account since they're an anonymous web user. The web app will be configured to access it in a trusted type connection using a predefined windows account.

One other question. I think with so many users hitting the snapshot database to run reports, this could maybe create some locking at the database level? If this is correct, then having our main reporting database locked (if there are locks at the database level) while users are running reports could affect the peformance of people doing things like logging in, creating new containers, etc. If we could just move the report data out of the main database then locks on that database while reports are being generated won't affect the main database for things like logging in, creating containers, etc. Any insight into this would be helpful. This is all theory in my mind and I'm not sure what is really getting locked while queries are being run. I know pages of data in the tables being queried, but could other aspects of the database be affected for large queries? If so, then moving these tables where large queries will occur to its own database seems wise as this would keep other non-report generation type events running smoother.

I hope all that makes sense.

Thanks for your input!

Chad

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

7020 Posts

Posted - 2005-04-20 : 20:57:25

quote:
...If you could, would you elaborate a bit on the security issues? This is the biggest concern I see. Performance is a major concern as this is a reporting app that has some unusual performance requirements, but security is also important because the data is sensitive (healthcare related)...

The problem is that if you are accessing a table in a different database, the user accessing the table must have direct permission to the table, and it cannnot come via a stored procedure.

quote:
One other question. I think with so many users hitting the snapshot database to run reports, this could maybe create some locking at the database level? If this is correct, then having our main reporting database locked (if there are locks at the database level) while users are running reports could affect the peformance of people doing things like logging in, creating new containers, etc. If we could just move the report data out of the main database then locks on that database while reports are being generated won't affect the main database for things like logging in, creating containers, etc. Any insight into this would be helpful. This is all theory in my mind and I'm not sure what is really getting locked while queries are being run. I know pages of data in the tables being queried, but could other aspects of the database be affected for large queries? If so, then moving these tables where large queries will occur to its own database seems wise as this would keep other non-report generation type events running smoother.

This is not really the way locking works. A query will usually get a shared read lock at a row or table level, and it will do this whatever database it is in.

One thing you still need to be concerned about is referential integrity, because you cannot create a reference to a table in a different database. This is important in a reporting database as well as a OLTP database, because you still want to ensure you have good data.

CODO ERGO SUM

jen
Master Smack Fu Yak Hacker

4110 Posts

Posted - 2005-04-20 : 22:28:01

quote:

If you could, would you elaborate a bit on the security issues? This is the biggest concern I see. Performance is a major concern as this is a reporting app that has some unusual performance requirements, but security is also important because the data is sensitive (healthcare related).

on security, good that you're going to use windows authentication. if you're going to tighten it another notch higher, setting the approle will be great unless you have connection pooling (in which case the approle won't work)

direct query can be avoided by
a. creating views and using this view for the join instead of the direct table
b. creating functions that will return records pre-filtered as much as you can

In our case, the initial catalog for the system is to the database with more sensitive data, so if they do cross over to the other database, they won't be able to get to the primary database.

IMHO, the issue for security is this: if there is only one user to this server (your web server), that's good. But if you plan to have other users access one database and not allowing them to access the other DBs, that'll be a little tricky to handle

--------------------
keeping it simple...

chadbryant5
Starting Member

32 Posts

Posted - 2005-04-21 : 11:05:08

Thanks for all the responses. Our database for the reporting aspect of it will be denormalized a lot for optimization so it seems that possibly many of the "normal" things to worry about in database design won't apply as we'll be breaking many rules.

We will have to be very careful in our application logic as we build data in tables in both databases because they will have to match. Records in our snapshot database tables will have to have corresponding foreign keys in the main database tables. But we'll probably just have to handle the RI aspect of this in our code on our own.

Our manager just wants to keep things clean. The snapshot tables are likely to get flushed and rebuilt a lot as the user changes things...so having these volatile pieces in their own little world is desirable by our manager who is pushing for the separate database. However, not at the expense of performance.

Maybe not having RI across the databases would affect performance? I don't know. I guess this is what I'm mainly trying to find out and it sounds like this should not be a major issues although nobody has really addressed any potential performance issues. Security and data integrity are issues we have to take into account, but I think we know the issues there. What is unknown (and I guess we could run some tests) is what kind of performance issues are we possibly creating by having tables in another database. When a report in our app is generated, it will have to get data from both places and there will be a lot of joining between 3 or 4 main tables (the main one being a snapshot flushable table in the snapshot database and the others being less volatile ranking and grouping tables in the main database).

Thanks again!

Chad

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources