Author |
Topic |
waveform
Yak Posting Veteran
93 Posts |
Posted - 2010-12-01 : 01:42:28
|
Hi, I just read something interesting in an MS document about foreign keys. I've often created nullable foreign keys, which are null when there is no reference record to point to. Eg. If I don't know the author of a book, the AuthorID of the Book table will be null.But here is says null foreign keys are a bad idea.http://msdn.microsoft.com/en-us/library/ff647793.aspx#scalenetchapt14%20_topic6They recommend creating special records in the foreign table to indicate "status" and point to those. So the Author table would have an "Author Unknown" record, and my author-less books would have an AuthorID which points to that special Author record instead of being null.The MS document is an old one, so I'm wondering if this is still a recommended practice, or if it doesn't matter anymore? That is, are there still significant gains by avoiding outer joins like that?Cheers! |
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2010-12-01 : 06:50:19
|
It's not that old (2004), and as far as nulls are concerned, it's always recommended to avoid them as much as possible.The problem with null is that its meaning is vague and inconstant, i.e. it can mean different things in different contexts. In the case of "Author Unknown" you're substituting a clear definition for a clear situation, rather than using a nebulous null. This may or may not translate into performance improvements, but in any case it improves the quality of the data, which is more important anyway.Chris Date has written extensively on why null is bad, and his primary complaint/argument against SQL is that it allows them. You can find a lot of his writing via Google or in any edition of his Introduction to Database Systems book. |
|
|
nigelrivett
Master Smack Fu Yak Hacker
3385 Posts |
Posted - 2010-12-01 : 07:10:45
|
If you can handle nulls and optional joins and know their meaning (and you should be able to) then it's not an issue for you.Problem comes when other users try to access the data. Report writers and analysts are usually not so expert at sql and nulls can cause issues.That's why you normally wouldn't find them in a datamart which is meant to be for ease of access at the expense of performance.You usually would find them in databases that aren't built for direct user access (together with normalisation).Often in this situation you would hold normalised data but create denormalised tables for user access even if the system isn't segregated into different databases - or maybe create views to denormalise.If you can restrict user access to SPs which are provided then the issue goes away as the system can be restructured with having any affect on the outside applications.==========================================Cursors are useful if you don't know sql.SSIS can be used in a similar way.Beer is not cold and it isn't fizzy. |
|
|
waveform
Yak Posting Veteran
93 Posts |
Posted - 2010-12-01 : 09:22:04
|
Thanks for that, I'm clear about it now.robvolk, what about Date fields? I do try to avoid nulls in other kinds of data, but isn't it unavoidable with dates? |
|
|
nigelrivett
Master Smack Fu Yak Hacker
3385 Posts |
Posted - 2010-12-01 : 09:26:16
|
19000101 for a missing start date or unknown25000101 for missing end date==========================================Cursors are useful if you don't know sql.SSIS can be used in a similar way.Beer is not cold and it isn't fizzy. |
|
|
waveform
Yak Posting Veteran
93 Posts |
Posted - 2010-12-01 : 09:37:14
|
Thanks Nigel. Is that a widely used standard? Never heard of doing that before.What's wrong with using nulls for dates in that situation? |
|
|
nigelrivett
Master Smack Fu Yak Hacker
3385 Posts |
Posted - 2010-12-01 : 09:50:15
|
Widely used in systems I've worked on.I've seen it in other places too.Nothing wrong with a null but you end up with a lot of ocnditionals to deal with.where (enddate >= getdate()-2 or enddate is null)also ordering can get a bit messy.Of course there are issuesIf you have a startdate what do you do for something that has not yet started25000101 would probably be best as it would be excluded when looking for things that started before a date - but then you could use 19000101 and deal with that specifically.For smalldatetimes I use 20500101==========================================Cursors are useful if you don't know sql.SSIS can be used in a similar way.Beer is not cold and it isn't fizzy. |
|
|
waveform
Yak Posting Veteran
93 Posts |
Posted - 2010-12-01 : 10:15:36
|
quote: Originally posted by nigelrivett Widely used in systems I've worked on.I've seen it in other places too.
That doesn't sound particularly wide. ;) But thanks, I'll keep it in mind. It seems, though, if one has to deal with a specific case, may as well use a null there, as it's fairly self-explanatory: Date not set or not known. |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2010-12-01 : 11:10:45
|
I'm of two minds re: date fields and nulls. While I hate to leave them null for "unknown" or "not started/ended", having magic date values to substitute has its own problems. I've dealt with multiple billing systems that used different magic end dates, and it's very frustrating for global reporting (not to mention undocumented). Dates are rarely used in join conditions or foreign keys so the impact is somewhat less than for other types. |
|
|
TimSman
Posting Yak Master
127 Posts |
Posted - 2010-12-01 : 11:24:11
|
I ran into a situation where I HAD to have NULL values.We were gathering financial data, and to show data that was not provided in numerical fields. Yes, we could have defaulted to 0, but that meant adding another field to represent that the 0 was not actually 0. Of course, these fields are not involved in any search conditions, and are not linked to any tables, so it's probably not as big a deal. Also, it repesents what NULL represents, that there is no value given. |
|
|
nigelrivett
Master Smack Fu Yak Hacker
3385 Posts |
Posted - 2010-12-01 : 11:32:51
|
Nope - you've defined to mean that there's no value given.That's fine but it's only one of the possible meanings for null. What if you'd had to distinguish between not applicable and not given.It depends on who is going to use the data as to whether it is a good idea. Will they realise what a null represents and can they handle a null value - in aggregates as well as joins?==========================================Cursors are useful if you don't know sql.SSIS can be used in a similar way.Beer is not cold and it isn't fizzy. |
|
|
TimSman
Posting Yak Master
127 Posts |
Posted - 2010-12-01 : 11:49:31
|
Aggregating was one of the reasons we went with null. It is currently easier for us to use built-in aggregation functions in SQL and not have to perform an extra check on a bit field to get the values.Considering the business people told me specifically that they need to represent values that were not provided (or not applicable, which in this case is pretty much the same thing), and the data isn't used in joins, it made the most sense.Generally, I avoid NULLs as much as possible, but this was a specific business case. I could have gotten around the use of NULL, and I may have to further down the road, but for now it makes the most sense. |
|
|
russell
Pyro-ma-ni-yak
5072 Posts |
Posted - 2010-12-01 : 12:20:21
|
Using default dates instead of nulls is a terrible idea -- it causes far more ambiguous interpretations than nulls do. It causes the exact same sorting issues that nulls do. It clutters where clauses. Never, never, never substitute BAD DATA for UNKNOWN data.Our first responsibility is always to make sure we have good data. |
|
|
|