Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Development (2000)
 Does this need to be normalized?

Author  Topic 

X-Factor
Constraint Violating Yak Guru

392 Posts

Posted - 2006-02-07 : 12:57:14
Hi,

I need to save a user's searches for jobs. A search can be done on keywords and multiple keywords can be entered with a comma seperated list.

The question is, is it ok to just save the CSV in a single field and split it up prior to every time the search is run or should I put each key word in its own row in a table of keywords?

CHeers, XF.

eyechart
Master Smack Fu Yak Hacker

3575 Posts

Posted - 2006-02-07 : 13:01:12
the splitting/combining operation could be expensive, so if you can normalize this you are probably going to be in better shape.

Not only that, but an index over this column (the comma separated version) would be completely useless. However, an index covering the normalized column might be of more use.


-ec
Go to Top of Page

X-Factor
Constraint Violating Yak Guru

392 Posts

Posted - 2006-02-07 : 13:33:24
Hi there,

I hear what you're saying but I think that splitting up a string is a trivial task compared to connecting to a database and filling a container with keywords. And no combining would be needed unless the user updated their search parameters.

Secondly, what use would an index be on a keyword column? The keywords themselves won't be searched. They key words will be retrieved and looked for in job descriptions and titles. (Which I'm intending to create a full text index for)

XF.
Go to Top of Page

rlreid
Starting Member

1 Post

Posted - 2006-02-07 : 15:24:00
With the disclaimer that no one can give you a worthwhile answer without seeing the data and how you use them:

Xfactor, the short answer is no. I see no advantage whatsoever to
normalizing this. Normalizing is always the right thing to do, but
you always have to define what the "things" are.

What you are saving here is not a bunch of keywords, but a search string. Splitting it up wouldn't be normalizing, it would be shredding for no good reason. You get it as a CSV, right? Then
return it the same way.

You could get absurd and then say, well, every word (not just keywords) should be normalized out. And then every character in
every word. As a theoretical concept, that's perfectly acceptable
and proper. And it's obviously an absurd, idetic application of the concepts of normalization.

I am NOT argueing against normalization! I am one of those who (after 25 years of RDB experience, in which I did more and less
rigid normalization) is pretty orthodix on the subject. But normalization isn't an abolute concept.

One of the best discussions of normalization is Chapter 7 of C J Date's "Database In Depth" from O'Reilly. He actually takes us all the way through 6th normal form. Not because he is advocating going past 5th, but because doing so will show how to normalize it well, and
even how to avoid the use of NULLs. It can be a useful excercise, after which you recombine the tables who always have identical keys to get abck to 5th.

Your commas are just a part of the full text. I don't know if you
should even be storing it in an RDB, but it sounds already normalized
if all you are doing is storing strings and giving them back.

======================================================

I actually know a system when quanitities are not stored as integers,
but as codes which are used to look things up in a "Quantities" table.
No kidding - you don't have instead of
OrderNum | ItemNum | Quantity
12345 999 10

you have
OrderNum| ItemNum | QuantityNum
and you have to
select Amount from Quantities, Order
where Order.QuantityNum = Quantities.QuantityNum

(I'm not making this up). And the designer defends it, since now if
they ever decide that everything that was ever an order of, say, 2,
should be an order of 4 instead, they just update a single row in the Quantities table so the old QuantityNum for 2 is changed to 4.

Of course, that would violate the uniqueness of the QuantityNum key.

This is what I mean about deciding what to normalize. Some folks decide normalization is a bad thing after they improperly normalize something. Hey, why store text at all?
Go to Top of Page

X-Factor
Constraint Violating Yak Guru

392 Posts

Posted - 2006-02-08 : 10:48:01
Thanks for your reply.

With the site I'm working on, a user's search can also contain a list of job categories with which to filter on. I figure that in this case it is worth while putting each selected category in a table which joins to the table of job categories to obtain referential integrity.

>>I don't know if you
should even be storing it in an RDB

Why wouldn't I do this? The data needs to be persisted so that emails can be sent based on the search criteria. How else could this information be stored?

Cheers, WT.
Go to Top of Page
   

- Advertisement -