Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Development (2000)
 The more databases the better (discussion)

Author  Topic 

Page47
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2002-06-20 : 14:30:37
I’m having fun with these ‘discussion’ threads.

Here’s today’s question de jour. What is your logical/ theoretical decision rule for creating a new database?

Well, right away, I think the opposite question is much easier to answer. You cannot persist referential integrity across databases (in SQL Server), so any entities that require such integrity should be in the same database. But what happens when you have a group of tables, a schema, that is not referentially integrated into the rest of your schema.

Consider an object audit. Let’s say you have a ‘Changes’ table. Additionally, suppoze that you have ‘Tables’ and ‘Columns’ tables to maintain data about what you auditing…


create table audit (
auditid int identity(1,1) not null primary key,
reftable varchar(50) not null constaint fk_audit_table refereneces tables(reftable),
reccolumn varchar(50) not null constraint fk_audit_table references columns(refcolumn),
recordkey varchar(50) not null,
changedate datetime not null constraint dft_audit_changedate default getdate(),
oldvalue varchar(200) not null,
newvalue varchar(200) not null)

 

(Hopefully from this you can infer the rest of the schema….) Where do you put this? A new database or in the database your are auditing? (ok, the quick ones out there will probably say, "don’t you need a userid in that table to track who made the change." ... Well, that would blow the example outta the water, yeah? My requirements are no userid and thus no referential integrity to the rest of the schema…)

Consider a more drastic example. You are building a system for a University. ‘Students’, ‘Classes’, ‘Enrollment’, ‘Professors’, etc etc. Lets say the Maintenance Crew needs a schema to track trouble tickets (leaky faucet, barf in the hallway and the like). Different database yet?

Now, granted, there are some things with regards to the physical implementation that would guide this decision. Maybe the maintenance crew database has its own backup/recovery plan (although this could be done with filegroups) or they get their own RAID (filegroups again) or maybe their own server . . . But put the physical things aside, as that makes the answer too this easy.

This came up recently. Programmers contested that it was ‘good object oriented design’ to split widgets out of the foobar database. My gut said NO. My initial argument was that this is relational and not an object oriented database and applying oo methodologies was like trying to put ketchup on sushi...well that fell on deaf ears. Luckily I was able to point to maintaining one backup/recovery scheme as a valid reason, but if I hadn’t I’da been up the creek.

So what do you think?


<O>

izaltsman
A custom title

1139 Posts

Posted - 2002-06-20 : 16:37:51
More databases = more administration headaches.
As you go down the path of creating multiple databases, you have to start worrying about having to manage multiple sets of db users, about performing same maintenance tasks on every single one of them, managing space between more files (especially when it comes to log files). And if you have to maintain some degree of synchronization between several databases, it is often quite difficult to ensure that everything is synched up after you try to restore all dbs from backup ('cause most likely your backups will not be done at exactly the same time on each database). Plus there is bound to be disagreement on which tables belong in which databases (i.e. I'll bet that janitors in your example would want to have a table of all the buildings on campus, even though the exact same table will exist in the main university database as well).
Another argument you could make against multiple dbs is that system tables for each database take up some space (although not a whole lot).
On top of everything, I completely agree with your argument about the fact that we aren't dealing with an object-oriented DBMS.

To summarize -- I don't create multiple dbs without a good reason (and good reasons to create them do exist, but the fact that someone thinks the database should be "more object-oriented" definitely ain't good enough! ).

Go to Top of Page

MichaelP
Jedi Yak

2489 Posts

Posted - 2002-06-20 : 16:52:33
I totally agree with Izaltsman. Another thing that he didn't mention was ADO Connection pooling. If you have multiple databases, you have multiple "connection strings" and ADO only pools together connections that are exactly the same. Plus, keeping up with multiple connection strings kinda sucks too.

The only good reason I can come up with is to separate data into multiple databases for security reasons. To keep user X from Company A out of Company B's data. Something like that.

Michael



Go to Top of Page

JustinBigelow
SQL Gigolo

1157 Posts

Posted - 2002-06-20 : 16:58:23
Page, this maybe oversimplifying but... if you store the audit set as its own database without a DatabaseID field then you could end up needing to audit your audit database . Your question seemed to revolve around the idea of one main database (or no more than a few databases) on the server. In my situation I'm working off of a hosted SQL Server in which there is somewhere in the neighborhood of 50+ databases on the server. Even if the schemas were all the same (a hosted application for instance with each client getting their own db) storing audits in a universal db would be a pain without identifying which db it came from (admittedly this could be managed by checking the qualified table name). In my particular situation storing the audits in universal table wouldn't be practical as I don't have access to databases outside of my hosted db.

My 1 cent (I'll add the other when I have more experience to draw from )

Justin

Go to Top of Page

Page47
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2002-06-20 : 18:42:29
All excellent reasons. I could not agree more. However...
quote:
But put the physical things aside, as that makes the answer too this easy.

Are there any non-administrative reasons? Are there any reasons that are true regardless of the dbms implementation? I'm thinking more on the logical level than on the physical layer.....

What would Codd do (w.w.c.d. - hehehe, funny)? . . . Given the task to create the logical design for a database that was to hold all data, would the good doctor create it all in one database?

Sure this conversation is purely academic in nature. Granted. However, lately, I've kinda been of the mind that before I do something I need to at least consider w.w.c.d.... (Example: sure, its probably best to use an identity column for the primary key of table t, but first I need to at least consider the natural key.)

<O>
Go to Top of Page

M.E.
Aged Yak Warrior

539 Posts

Posted - 2002-06-21 : 11:16:07
The only reason I could see why you'd create 2 seperate databases is for permissions. Perhaps theres a confidential table regarding the amout of barf in the hallway that only janitors can access, however the professors on the other side have things running that require them to have DBO or admin access to the database.




quote:
Are there any non-administrative reasons? Are there any reasons that are true regardless of the dbms implementation? I'm thinking more on the logical level than on the physical layer.....



I think the biggest reason is the admin headaches it causes. Having to run 2 backups, 2 sets of logs, 2 sets of permissions, 2 sets of maintanence... I'm sure theres more.



quote:
I’m having fun with these ‘discussion’ threads.

Keep em coming, I've yet to actually design a database myself (well, for money atleast... my own little fun projects don't count). I've always worked under a project leader. By the time I actually do, with all this new info I'm getting from here.. I figure I might actually do a decent job.

-----------------------
Take my advice, I dare ya
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2002-06-22 : 16:57:56
Just to be devil's advocate:

In using a single table to audit all columns in all tables (if only in a single database), you're introducing a number of things that really complicate the auditing procedure:

1. Data type conversions - all of your values need to convert to varchar. No big deal really, but it's less than optimal

2. Primary keys - this forces you to use single column primary keys. No big deal either, if you're designing tables that way, but it could cause you to include redundant data

3. CASCADE UPDATES - eeeeeeeeeeewwwwwwwwww, I don't even want to imagine how that could send the server into orbit while auditing cascading columns

4. Data audit integrity - this is kind of important: what are you auditing, just the columns, or the entire row? If you need to get a snapshot of the row at a certain point in time, storing each column's changes on separate rows is gonna complicate this.

The last point, IMHO, is a killer. Suppose you have a table with 20 columns and 15 or more are updated. You now have 15 rows in your audit table. If you want to reconstruct that, I think you're screwed. Here's why: I can't see any way for you to insert 15 rows from 1 source row and maintain identical timestamps. Sure, they probably won't spread out more than a few milliseconds, but they'll complicate the ability to undo the entire row back to a specific point in time. Now imagine, using this technique, auditing an UPDATE statement that affect 2,000 rows and 15 columns.

Plus, you can't differentiate whether it was one operation that affected 15 columns, or 15 simultaneous operations affecting one column each, or any combination in between. That's probably not a huge concern, but to me, auditing is there to track EXACTLY what happened to a row.

And how do you reconstruct an entire row back to a certain point in time, when there have been many other updates before and since, and may or may not require getting values from the parent table in order to do so? That's gonna require a wicked crosstab to match the parent table and audit table!

That's why I will forever use an audit table that has the same structure as the parent table, adding only the datetime and user ID columns. I'm not concerned with having to alter both tables, 'cause it forces a discipline in designing the tables and reviewing those changes and their side effects. Nor am I bothered by the waste of space if only one or a few columns is updated, because I'll always want an audit of the entire row. Comparing old and new values is pretty easy, because I only need to retrieve 2 rows from the audit table, and maybe one from the parent table.

Go to Top of Page

nr
SQLTeam MVY

12543 Posts

Posted - 2002-06-22 : 20:25:58
????
A database is an data integrity object.

i.e. when you restore a database it is consistent.
If you distrbute data over several databases you will need to make sure there are no consistency issues between them, include methods of updating them when one is restored or take down the system to perform backups so that they can all be restored to a consistent point.

Allied to this is problems with distributed transactions preventing transaction logs from being cleared, extra work the system has to do to check permissions... but none of this is insurmountable.

It should be fairly obvious from the system architecture what needs to be in the same database - if it's not try rethinking the design.

==========================================
Cursors are useful if you don't know sql.
Beer is not cold and it isn't fizzy.
Go to Top of Page

Page47
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2002-06-24 : 09:41:54
argg.....My example has become the focus rather than the question....

quote:
It should be fairly obvious from the system architecture what needs to be in the same database - if it's not try rethinking the design.


I don't think it can be called obvious if there is no sound logical reason. Let me try to frame the thread this way.

In the logical design phase (non-DBMS specific) of modeling all knowledge in the universe, there are no good reasons for having more than one database.

Can we prove this statement wrong?



<O>
Go to Top of Page

MakeYourDaddyProud

184 Posts

Posted - 2002-06-24 : 10:33:34
Most of the arguments here are valid except the most important question is what does the enterprise want from the data rather than how can the database work with this design scheme. All kinds of traditional issues concerning DRI, security and data logistics arise from the need to provide an enterprise compliant data source whatever databases are choses. However I think several serious questions arise in an enterprise feasibility study of its potential data storage and manipulation requirements, these are:

What encapsulations of data is required
---------------------------------------------------------
READONLY-TRANSACTION-PRODUCTION Model

1) Does this organisation at best need fast read access to its data at all times and how much latency time can they 'get away with' before the data is refreshed.

2) Can the organisation get away with a readonly data source? This can open up the possibility of fast data access and locking issue become a significantly lesser problem. This might be required in a configuration where the data is available 24x7 to the Web. DRI may not necessariry need to applied to this DB since it is read only.

3) How often are readonly databases refreshed? This could be a serious issue since enterprise (particularly billing) databases are physically huge. eBay applies updates at regular intervals.

4) Can the enterprise utilise a Transaction Database? This may simply be data insertions and updates which are not applied in real time, but at a scheduled time to a production database which is for exclusive use to the enterprise

5) Firewalls and Tiered Servers - You could have your read only database at a lower level of security nearer to the firewall but at a higher level of access to the masses. Your transaction database would operate in the highest possible security with middle tier business level logic validating all user transactions both Web and Internal.

Daniel

Go to Top of Page

Dave Kawliche
Starting Member

20 Posts

Posted - 2002-06-24 : 10:38:56
robvolk you are hardly being devil's advocate with your comment on single table audits. I have been forced to work with large databases that used this technique and for all the reasons you state it was a big PITA !! It is unfortunate that the single table audit is such a common "example" in books and web sites (and message boards ;-) that many people seem to think it is a best practice which in most cases it definitely is not!!

I usually advocate use of "valid-time" tables for auditing and keep them in the same db to save on maintenance as described here :

http://accesshelp.net/survival/timeaftertime.asp

Does anyone have any knowledge of specific pros/cons related to _performance_ if "valid-time" audit tables were moved into their own database on the same server ? I would definitely be willing to recode all my triggers to use 3 part catalog.owner.object identifiers if it would lower the overhead of this mechanism.

regards,

Dave Kawliche
http://AccessHelp.net
http://1ClickDB.com
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2002-06-24 : 11:34:19
quote:
Does anyone have any knowledge of specific pros/cons related to _performance_ if "valid-time" audit tables were moved into their own database on the same server ?

Dave-

It's a two-edged sword in keeping the audit tables in separate databases. You can improve performance if that audit database is on a drive completely separate from the other database drives, because it can use parallel I/O to write the audit data at the same time the normal data is handled.

If the audit database shares a drive with another database though, it will lower performance because I/O will serialize on that hard drive for the audit table and the other databases. However, you gain a huge logical advantage by keeping the audit table in the same database because you can be absolutely sure which database it is auditing. You can also use filegroups to put the audit table in the same database but on a different drive, and only the audit table would use that filegroup. Essentially you can get the best of both worlds with that approach; logical relationship and optimum performance.

As far as the design aspects, what I was suggesting earlier was that, Hey, it's bad enough using a single audit table WITHIN one database, it's 10 times worse to have a separate audit database for ALL other databases. Ilya's totally right on this too, the admin alone will sink that idea. All you need is one failed backup and your audit database is FUBAR; you won't have a chance of synchronizing later if you need to.

Go to Top of Page

Dave Kawliche
Starting Member

20 Posts

Posted - 2002-06-24 : 12:05:45
thanks for such an articulate response rob, I agree that reduced contention for disk resources would not a good motivation for separating databases. Using filegroups is a much simpler way to achieve that goal.

Wandering just a bit off the original topic here, would you have any suggestions on how to make the "expiration" of an old "valid-time" audit record faster for SQL Server (aside from enabling the most obvious index ) ??

My experience is that it is that much much _much_ easier to get sensible data back out when you have an "end date" on all audit records but I hate having to do both an update (expire old audit record) and an insert (to record new data) in the trigger for every change.

djk
Go to Top of Page

nr
SQLTeam MVY

12543 Posts

Posted - 2002-06-24 : 14:29:54
quote:

In the logical design phase (non-DBMS specific) of modeling all knowledge in the universe, there are no good reasons for having more than one database.

Can we prove this statement wrong?
<O>



Logical design? Does that specify tables let alone databases?
These are both implementational things. Entities may (probably will initially) map to tables in creating the physical model but may not.
Databases will come in when you are considering the architecture and will take in to account things like volatility, criticality, security, fragility, ....
You may come up with purely business reasons (e.g. data ownership) during the data analysis which may suggest database splits though.

==========================================
Cursors are useful if you don't know sql.
Beer is not cold and it isn't fizzy.
Go to Top of Page

Page47
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2002-06-24 : 14:38:58
Gawd Damn!!! You're right!!!

What the hell am I thinking? Of course, the reason why I can't come up with a decent non-implementation reason/decision rule for drawing the boundary lines of a database is because a database is a physical implementation itself. Its reasons for existance are physical in nature.

I guess.

(Gosh, I hope this doesn't show up as a quote-of-the-week on dbdebunk.com ....)

<O>
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2002-06-24 : 14:48:45
quote:
Of course, the reason why I can't come up with a decent non-implementation reason/decision rule for drawing the boundary lines of a database is because a database is a physical implementation itself. Its reasons for existance are physical in nature...I guess
That's a damn good guess! Like guessing the sky is blue.

I'm glad Nigel got you back on track, he very eloquently said what I was thinking: don't forget that this is a database, not just an E-R diagram. You'll never find the ideal database design, there is no such thing. I think you should trust your gut instincts a little more, they've been right on target since the beginning of this thread. {:)]

Go to Top of Page

nr
SQLTeam MVY

12543 Posts

Posted - 2002-06-24 : 14:50:46
Holiday must be good for me - I didn't even think about being sarcastic.

But had I read the original question
quote:

This came up recently. Programmers contested that it was ‘good object oriented design’ to split widgets out of the foobar database. My gut said NO. My initial argument was that this is relational and not an object oriented database and applying oo methodologies was like trying to put ketchup on sushi...well that fell on deaf ears. Luckily I was able to point to maintaining one backup/recovery scheme as a valid reason, but if I hadn’t I’da been up the creek.
<O>



That's why you need to keep application developers away from database(s) design.
People who say it's ‘good object oriented design’ usually don't know what they're talking about otherwise they'd give the reason why it's good.

==========================================
Cursors are useful if you don't know sql.
Beer is not cold and it isn't fizzy.
Go to Top of Page

Page47
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2002-06-24 : 15:03:38
quote:
That's why you need to keep application developers away from database(s) design.


That's one of the unique challenges of being a contractor, especially being a database contractor. You have no control over political power, so the only way to keep someone off your turf is by presenting undeniable reasoning. In this case, I could tell them they were wrong, but I couldn't verbalize why I was right.

<O>
Go to Top of Page

nr
SQLTeam MVY

12543 Posts

Posted - 2002-06-24 : 15:27:40
quote:

That's one of the unique challenges of being a contractor, especially being a database contractor. You have no control over political power, so the only way to keep someone off your turf is by presenting undeniable reasoning.
<O>



Sounds like you're trying to be far too friendly.

Try
'You employed me because I know what I'm doing - you obviously don't so just sit there and watch and learn.'

Companies don't tend to like me unless they are in trouble.

==========================================
Cursors are useful if you don't know sql.
Beer is not cold and it isn't fizzy.
Go to Top of Page
   

- Advertisement -