Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Development (2000)
 Database Normalization is overrated.

Author  Topic 

00kevin
Yak Posting Veteran

78 Posts

Posted - 2004-11-02 : 17:57:30
Now that I've been developing databases for 10 years. I'm starting to realize that a completely normalized database is far too complex to work with. Sure, it all works and looks great on paper, but it basically becomes a castle made of cards. If you change one table the entire database application can break.

I've noticed that some front end database applications change so frequently that normalization can't handle it.

Consider a very simple example of an invoice receipt system.
A simple solution would be to make three tables.

CUSTOMER
INVOICE
RECEIPT

If I was to follow standard database normalization rules the database would link like this.

CUSTOMER.customer_id
CUSTOMER.customer_name

INVOICE.invoice_id
INVOICE.customer_id
INVOICE.amount

RECEIPT.receipt_id
RECEIPT.invoice_id
RECEIPT.amount

under this design, if I wanted to find out who owns a receipt I would join with the receipt table with the invoice table and then look up the customer.

This however will lead you down a design nightmare with a large database. You will have Views built on top of other Views or join statements that are just insane and hard to manage.
A better design would be something like this.

CUSTOMER.customer_id
CUSTOMER.customer_name

INVOICE.invoice_id
INVOICE.customer_id
INVOICE.amount

RECEIPT.receipt_id
RECEIPT.customer_id
RECEIPT.invoice_id
RECEIPT.amount

The addition of the RECEIPT.customer_id will make the database faster and easy to maintain. You just have to make sure that your stored procedures do a little more work. The tables I'm working with right now contain many keys from tables high up a 1 to N tree of tables. In other words, the leaves on my tree contain keys to the roots of the table.

I also think that enforcing referential integrity is over rated. I would much rather maintain database integrity with with stored procedures and well written application code.

My point is that in the real world you will NEVER have a complete understanding of the scope of the project you are working on. Business applications are always changing. You can't even imagine how your database will change over the next 5 years. You can make an educated guess, but that is no different the trying to predict the weather. Changes after changes will be made to your database and at some point it will break if you are too strict with normalization and enforcing referential integrity. It will become a ball of wires because you will never have enough time to redesign major portions of the database. Most the changes you will make will end up being quick fixes just to complete the business requirement. You best friend will become the LEFT OUTER JOIN. The fact is, a business does not have the time or the money to waste on you redesigning the datbase everytime something new is requested.

I just wish I could find a good book about designing Flexible databases. Too bad I've had to learn the hard way. Just thought I would post this so someone else doesn't have too.





ehorn
Master Smack Fu Yak Hacker

1632 Posts

Posted - 2004-11-02 : 18:51:46
"Let us talk a bit about normalization and abstraction:

When we embark in this normalization process we can go between 2 extremes: fully normalized and fully denormalized (everything lumped together into one entity). Anything in between can be considered good or bad, depending on circumstances (which also change over time).

One cannot say “normalization is overrated” (and expect not to be challenged) since a valuation of what is appropriate (name it “common sense” if you like) for a subject being modeled cannot be made unless the subject and its context is actually described, in sufficient details, at that particular time. In other words, one cannot just say “little bit more or little bit less normalization is appropriate” … you also have to describe the subject being modeled and its context.

This is akin to saying “my query is not performing” … it just doesn’t make sense unless you specify the actual query (“the subject”) and its context … the same query executing in 30 seconds may be appropriate for a reporting application but may not be for another OLTP application. I remember a client calling me in and saying, “my system is not performing” … and I saying “Aha … and?” … followed by inquiries about the context: type of application, suspect SQL statements, history of recent changes, what is considered an appropriate response time, etc. What did I just do? I used context to establish a valuation system … after that, talk about common sense, this is worse, this is better, this and that.

Someone throwing at you just “my system is not performing” coveys as much value as someone throwing at me just “normalization is overrated”. Of course, you are totally entitled to express your opinions since people come at this site exactly for this reason, to get them (and keep coming back because they are overwhelmingly valuable) … but do not expect not to be challenged if your opinions are generic or suspect. On a case by case basis I have no doubt you can give plenty of examples where too much normalization literally killed performance … but “case by case” is what I am getting at … the “case” should come before the assessment if you are to be fair and objective....."

http://asktom.oracle.com/pls/ask/f?p=4950:8:::::F4950_P8_DISPLAYID:12678058160686#12742558341255
Go to Top of Page

nr
SQLTeam MVY

12543 Posts

Posted - 2004-11-02 : 19:29:27
It's taken you 10 years to realise you're doing the wrong thing?

>> I've noticed that some front end database applications change so frequently that normalization can't handle it.
Sounds like you build applications that are bound to the database - an application chnage shouldn't necessarily mean a database change. A database change usually doesn't mean an application change.

I suspect this may be the route of your problem - designing the database from the application rather than modelling the business entities.

As for your example - it is common to denormalise to for performance and for database quirks. Nothing wrong with it as long as you recognise the redundant data and remember that it doesn't own that data. Not sure this example would gain much but there are examples that would be applicable. Remember also that denormalisation can slow down other processes whilst speeding up the query you are helping due to using up memory. Not keen on views for this sort of thing.

As to ref integrity - depends on the environment. If I'm working alone I will often enforce integrity in the SP layer (never the app) but it means a lot of checking whenever I make an ad hoc update. Ref integrity is a lot better implemented on sql server know so I tend to apply it.

see
http://www.nigelrivett.net/BadThings.html

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2004-11-02 : 19:57:52
I hear some of what you are saying, but i definitely disagree with some of your thoughts -- namely that Views are bad or more than a few joins are confusing or a negative trait of a DB design.

You mention being able to adapt your database for the future -- that's exactly the ADVANTAGE of normalization, as opposed to a CON!

In your example, you decide it saves a join and is "easier" to store redundant info -- the cusomterID -- on a receipt, even though it is linked to an invoice, which contains the customerID. Well, that's great -- when your system is first written, maybe each invoice can only apply to 1 customer so it works ok (despite the redundancy and all that). then your requirements change and like most complex accounting systems, it turns out you need to allow for more than 1 customer to be charged on a single invoice. (My last company had this situation). So, in the DB design, you need to take customerID out of the invoice table and put it into another table -- CustomerInvoices -- with a PK of (customerID/InvoiceID). Now where does that leave your Receipt table? You no longer can have the CustomerID in there. But maybe hundreds of SQL statements and application logic and reports all rely on this to be true.

You have just designed a DB that cannot expand gracefully, because that is the problem with redundant data -- you can't just change it in 1 place !! you need to change it EVERYWHERE. What if you stored redundant data all over the place in your database to avoid joins, and then you had a few design changes similiar to this? Time to start from scratch (which isn't a bad thing, either, in many cases).

Relational database applications (i.e., SQL Server, Oracle, even MS Access) are designed with 1 purpose in mind -- to optimize joins! That is what they DO. To design your relational databases to avoid joins wherever you can just simply doesn't make sense.

Now, don't get me wrong, as always, there is no hard and fast rule for everything, but I have found that Views really can truly solve so many problems like the one you are experiencing. People don't like using them for some reason -- I've seen many DB's with no views at all but with hundreds of stored procs -- and I never understood that. I love views. They save so much redundant SQL. they enable you to hide your true normalized DB structure whenever you need. they enable great security. They make your stored procs much smaller and easier to manage/maintain. they allow you grow quite easily by letting you CHANGE your table structure and keeping the views the same.

Also, getting back to your notion of "saving the join" to the Invoices table to get the CustomerID: I can only say that I cannot think of one single useful report that you might generate that would not include at least some invoice info as well (i.e., invoice status or invoice date or invoice amount or discount or remaining balance due) so in almost all cases you'd need to join to the invoice table ANYWAY. What are you really saving by storing CustomerID redundantly?

Perhaps the key thing to think about is this: to avoid joins, maybe you decide that even MORE stuff should be stored with the Receipt -- such as Cusomter Name, Invoice Amount, etc. Now, where would THAT lead you? What methodololgy would you use to determine what information to move onto the Receipt redundantly and what to leave where it is? Why stop at CustomerID? This can really lead you to quite a mess if you decide to keep on "improving" the performance by storing copies of your data wherever you feel it might benefit.

As for referential integrity at the app vs. DB layer, i think you need to have both. you really should have the basic relationships set up. obviously you cannot do all the logic in your DB (though some like myself and Byrmol think you can put 99% of it in there!) but for imports and manual entries and other interfaces to your DB that might not be able to go through your application it is just a good idea. You can save more headaches than you will create if your database has integrity. Nothing like trying to find out why invoice #18273 doesn't display on a report until you realize it has an invalid customerID for some crazy reason. Who cares how it happened -- let your DB help you to avoid it at all costs.

- Jeff
Go to Top of Page

ValterBorges
Master Smack Fu Yak Hacker

1429 Posts

Posted - 2004-11-02 : 21:01:01
Look into Enterprise Architecture, Zachman Framework, Six Sigma, Modeling (BPM, CDM, PDM,OOM), Code Generators, Corporate Information Factory, OLAP, ETL, BI Tools.

Good requirements gathering, scoping, project management will help you align business goals.
Good architecture will help you manage change at the speed of business.
Good tools will help you generate 50% or more of the code from models of your architecture.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2004-11-03 : 01:16:48
"You have just designed a DB that cannot expand gracefully, because that is the problem with redundant data -- you can't just change it in 1 place !! you need to change it EVERYWHERE."

More importantly, IMHO, is that you have to REMEMBER all the places you have to change it. So you make a mistake ... you change some code which requires RECEIPT.customer_id to be fixed up ... but you forget to also do INVOICE.customer_id. So now you cannot trust your data, because you have no idea which value is correct ... nightmare!

Kristen
Go to Top of Page

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2004-11-03 : 11:07:49
The idea behind normalization is:

1 FACT IN ONE PLACE

This leads to 1 INSERT, 1 UPDATE, 1 DELETE when you have to manipulate the system,
That is easier as opposed to denormalizing the data and having to maintain multiple locations for the same information.

Now you say that this is leads to unflexible and hard to maintain systems.
I do not agree.
I say denormalized systems are more unflexible and harder to maintain.


rockmoose
Go to Top of Page

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2004-11-03 : 12:02:14
tell me about it.
we have an app that has all the business logic in the middle BL layer. all of the constraints, relations...
nothing in the db. just data. so i can go into a QA and mess everything up very easiliy if i make one mistake...

ok this really has nothing do to with normalization...

Go with the flow & have fun! Else fight the flow
Go to Top of Page

X002548
Not Just a Number

15586 Posts

Posted - 2004-11-03 : 12:40:00
Does anyone physically implement a logical data model?

The logical model is vastly important because it defines the business and all it's rules.

When you phyically implement that model (and there are some logical constructs that can never be implemented) you need to take in to account quite a number of things.

First and foremost (I know this is a sql server forum) is the platform.

You can then go on to volume, users, transactions, ect.

Been working with a lot of systems on "the street" lately...Those boys had to deal with serious through put on design...

And the answer is...it depends...BUT, you ALWAYS want to do a full blown logical model to understand your world...maybe go beyond and model some other worlds in the nieghborhood...

What you do with those rules is up to you, the System Architect.



Brett

8-)
Go to Top of Page

eyechart
Master Smack Fu Yak Hacker

3575 Posts

Posted - 2004-11-03 : 12:59:31
quote:
I also think that enforcing referential integrity is over rated. I would much rather maintain database integrity with with stored procedures and well written application code.


how much application code is actually well written? In my 10 years of experience , I have run across only a handful of projects that are actually well written.

I wonder how long it will take fabian pascal to find this thread and make you famous with a nice quote of the week.



-ec
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2004-11-03 : 13:31:16
All our code is extremely well written.

All your code are belong to us

Test.
Go to Top of Page

X002548
Not Just a Number

15586 Posts

Posted - 2004-11-04 : 09:47:17
I always found Celko's comments insightful....

Kinda like inciting a riot....




Brett

8-)
Go to Top of Page

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2004-11-27 : 20:44:21
quote:
Originally posted by eyechart

[quote]...
I wonder how long it will take fabian pascal to find this thread and make you famous with a nice quote of the week.
-ec


I guess most people already noted ... but:
Now famous

rockmoose
Go to Top of Page

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2004-11-28 : 05:57:11
"Fabian Pascal"? Sheesh, that name is about as believable as "Alligator Descartes".

Actually, Mr double-zero Kevin is currently quote of the week on the unfeasibly ugly front page of DBdebunk.
Go to Top of Page

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2004-11-28 : 10:33:19
"Fribble" is WAY too cool of a name to be real ....

- Jeff
Go to Top of Page

eyechart
Master Smack Fu Yak Hacker

3575 Posts

Posted - 2004-11-28 : 14:08:12
quote:
Originally posted by Arnold Fribble

"Fabian Pascal"? Sheesh, that name is about as believable as "Alligator Descartes".

Actually, Mr double-zero Kevin is currently quote of the week on the unfeasibly ugly front page of DBdebunk.




Isn't that what Rockmoose pointed out? btw, fabian pascal runs the dbdebunk site. what exactly are you saying?



-ec
Go to Top of Page

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2004-11-28 : 15:30:27
quote:

Isn't that what Rockmoose pointed out?


The link was to the 2004 quotes page: I was just saying that it's on the front page at the moment too.
Go to Top of Page

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2004-11-28 : 15:46:24
quote:
Originally posted by jsmith8858

"Fribble" is WAY too cool of a name to be real ....


Yeah, but "Fribble" isn't another French philosopher, and "Arnold" doesn't mean anything (not in English, anyhow)
Go to Top of Page

X002548
Not Just a Number

15586 Posts

Posted - 2004-11-29 : 10:47:46
I find this too coincidental...

Is Kevin really Fabian?

Is the posted comment meant to show what a good idea this is or to knock it?

Not only is the job of a good data modeler to acurately to model the business, often times it can show flaws with the existing business model. Changes to that business model can promote vast gains in effeciency.

Those models then change infrequently, rendering the point here, moot.




Brett

8-)
Go to Top of Page

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2004-11-29 : 18:44:33
Are we Paranoid Brett ..

Actually, I get the feeling that in some quarters
Database Normalization is Underrated !!!

the key, the whole key and ....

rockmoose
Go to Top of Page

X002548
Not Just a Number

15586 Posts

Posted - 2004-11-30 : 15:33:52
so help me Codd



Brett

8-)
Go to Top of Page
    Next Page

- Advertisement -