3 Million Rows Full Text Indexed

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

SQL Server Development (2000)

3 Million Rows Full Text Indexed

Author

Topic

snelson
Starting Member

10 Posts

Posted - 2004-11-18 : 11:03:05

I have a table with 3 million rows (company names with descriptions.)

The descriptions are full text indexed and the company names are also indexed.

We are finding that any search is relatively slow (about 4 - 8 seconds). Can anyone suggest a design or architecture that could improve our query results?

The query uses the company name and freetext search on the description.

PS: The tables have already been horizontally partitioned. This is the row count that we have to deal with.

Is it possible to make a couple servers work together on a SINGLE QUERY and then just add together the result set?

If someone could point us in the right direction it would be appreciated.

Thanks very much,

Steve Nelson

MichaelP
Jedi Yak

2489 Posts

Posted - 2004-11-18 : 12:54:40

"Is it possible to make a couple servers work together on a SINGLE QUERY and then just add together the result set?"

Yes, it's called Distributed Partitioned Views. From what I read, it's not super easy to impliment.

Describe your hardware setup. I suspect that throwing RAM and Disk at this problem is the way to go.

Michael

<Yoda>Use the Search page you must. Find the answer you will.</Yoda>

snelson
Starting Member

10 Posts

Posted - 2004-11-18 : 14:55:34

We are using

2 Duals 2.8 Zeons
with 2 Gigs of RAM on board
with 1.5 Gig of RAM dedicated to the SQL

We are getting results around 4 seconds..
but we need to get it down to about 0.3 - 0.5 seconds.

What you suggested sounds very interesting. I haven't heard of it though. I will do some research.

Any other suggestions in terms of hardware?

Thanks so much.

DustinMichaels
Constraint Violating Yak Guru

464 Posts

Posted - 2004-11-18 : 14:59:51

Have you tried using the Contains Predicate instead of the FreeText Predicate?
I'm not an expert on full text search but I would think the Contains predicate would be faster.

Dustin Michaels

MichaelP
Jedi Yak

2489 Posts

Posted - 2004-11-18 : 15:01:00

What about your disk configuration? How many disks, what type fo RAID setup, what SQL stuff is on the disk etc.

With SQL server, you usually focus on RAM, DISK, CPU or DISK, RAM, CPU in that order when looking at hardware upgrades.
If your database is smaller, go for RAM first. If your databas is really large (greateer than the amount of RAM the server can hold) then add disks.

I bet you need more disks.

Michael

<Yoda>Use the Search page you must. Find the answer you will.</Yoda>

snelson
Starting Member

10 Posts

Posted - 2004-11-18 : 15:07:45

When you say you need more disks do you mean putting the page file or FTS files on the seperate disk/controllers. What is your view on SATA drives VS. SCSI drives.

MichaelP
Jedi Yak

2489 Posts

Posted - 2004-11-18 : 15:53:42

Well, the generally accepted disk setup for a resonably large SQL database is this:

For a Read heavy database
(2) Disks in a RAID 1 Configuration for your SQL TX Logs (LDF)
(3-5) Disk in a RAID 5 Configuration for your SQL Data (MDF)

For a Write heavy or Read / Write balanced database
(2) Disks in a RAID 1 Configuration for your SQL TX Logs (LDF)
(4-6) Disk in a RAID 1/0 Configuration for your SQL Data (MDF)

If this isn't enough, then you can start doing things like adding another set of disks for your indexes.

As far as SATA vs SCSI, that's a tough call. Personally, I'd prefer SCSI becuase they are designed to be running all the time and have high performance. There are not too many SATA drives that are designed to be actively working most of their run time. If you can afford it go SCSI or preferably Fibre Channel. We use Fibre Channel external storage boxes for our SQL server's with disk setups like the ones above. You are talking about $30k+ or so for the storage subsystem, not counting the servers you need to connect to them.

SATA -- good for storing files that don't need constant access
Unless you use the Western Digital Raptors, and then I think it's a different story. I use them at home in a RAID 0 config, and boy are they fast. For SQL server, I'd avoid SATA.

Question:
How large is your database in GB's? With only 2GB's of RAM total, you might want to bump up to 4GB and use the /3GB switch.

Michael

<Yoda>Use the Search page you must. Find the answer you will.</Yoda>

byrmol
Shed Building SQL Farmer

1591 Posts

Posted - 2004-11-18 : 16:16:39

1) Split out the FT column in to its own table.
2) Ensure that the FT table has its own filegroup and files
3) As a bonus, the filegroup and files should be placed on their own disk.
4) In the FT column, break 1NF by placing the Key in with the description. ie "Microsoft||Really big software company that needs to do a lot of work on FTI".

DavidM

"Always pre-heat the oven"

snelson
Starting Member

10 Posts

Posted - 2004-11-18 : 16:19:45

Hey Micheal,

You have been super helpful. Our Database is about 30Gigs. The 3 million record table I specified is just the table of California. I am going to have to take a closer look at hardware.

Is there a program out there that will get the tables ready for Distributed Partitioned Views?

MichaelP
Jedi Yak

2489 Posts

Posted - 2004-11-18 : 16:29:50

I don't know of a program to handle the DPV's for you automatically.
For a 30GB database and only 2 GB's of RAM, I think that you definatly need to add RAM to start (4GB total at LEAST), and then take David's suggestions about the FT Filegroup and Files and my suggestions about the disk subsystem.

Considering your setup, 4 seconds is not bad at all.

Michael

<Yoda>Use the Search page you must. Find the answer you will.</Yoda>

snelson
Starting Member

10 Posts

Posted - 2004-11-18 : 23:38:52

Thanks for you advice. We have decided to probably buy a 8 xeon processor server with 16 Gigs of ram.

My question is with this type of setup would it be better to use DPV and cluster a few dual Xeons together or would it be better to buy the 8 xeon processor.

What we have are a ton of tables (here is an example of a table)

RecordId |Company Name |State |
Primary Key Full Text Indexed. Contstraint State='CA'

We have users specifying the Company Name and a state so our query only uses
the Company Name.

How could we Divide this table up for DPV, when the only values we get from the user
are the Company Name and State. Would I just a seperate the table in Half or what is the best way to do this if i want to just split it in two.

Thanks very much for all your help.

MichaelP
Jedi Yak

2489 Posts

Posted - 2004-11-19 : 02:49:11

In my experince and from everything I've been told, It's usually best to get the biger box with SQL server. DPV sounds cool and has some scalability advantages over not using them, but it's nto super easy to work with. Horizontal partitioning is as far as I've gotten with "high performance" sql server stuff. Even with some fairly slow hardware, I'm able to do intense things with my 25+ million row Partitioned View in just a few seconds on my single physical SQL Server.

Be sure to keep in mind what I mentioned before about where to spend money for SQL server. The 16GB's of RAM is a GREAT idea and will make such a huge difference in your queries. If you can afford it, DO IT! Instead of the 8-way Xeon box, you might want to get an 8-way capable box but only put two or four processors in it to start. You can always add them down the road, and that leaves you extra money to spend on more / faster disks. Pump money into your disk subsystem! After RAM, it's the best bang for the buck. Usually the sales folks for disk subsystems will have a technical guy that can help you configure the disk subsystem so that it's as fast as possible for what you are trying to do.

Something else you could try is this. Post some CREATE TABLE and INSERT INTO statements on the board and we can try to help you optimize your existing code etc to make it perform better with your existing hardware.

That should be enough things for you to think about for awhile. I'm off to play some Half-Life 2!

Michael

<Yoda>Use the Search page you must. Find the answer you will.</Yoda>

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources