Performance issues with large flat table

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

General SQL Server Forums

Database Design and Application Architecture

Performance issues with large flat table

Author

Topic

Mr. Flibble
Starting Member

7 Posts

Posted - 2008-09-03 : 07:20:37

I've a question or two about performance issues with a large flat table.

It looks something like this:
id | col2 | col3 | col4 | data

id is unique. col2/3/4 will be used as select constraints.

The only select query that will be run is SELECT data FROM table WHERE col2=a AND col3=b AND col4=c.

1) Is there any reason not to index col2/3/4?

2) When inserting I will have a lot of duplicate data, based on the id, which I don't want inserted (and can't be because id is constrained as unique). Is the fastest way to insert a load of data, which may contain unwanted duplicates, to just go and try and insert all the data and let it reject data which has the same id?
Or would it be better to first check if the row exists and then attempt to insert it? ie.
SELECT COUNT(*) FROM table WHERE id = x
[if result = 0]
INSERT ...
[else]
move on to next.

Typically I will have say, 100k rows to insert, 30k of which are unwanted duplicates.

3) The table will be in the order of 60million+ rows. col2 will only have maybe 5 different values. Provided it is indexed, will it be just as fast to leave all the data in that single table over creating 5 different tables and choosing the correct table programmatically before running a query? Basically, even if the table were 200mill rows, will SELECT (*) FROM table WHERE col2 = x LIMIT 1 take a miniscule amount of time, if there are only 5 values of col2?

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2008-09-03 : 11:40:28

The index should be col2,col3,col4 with include data if you using SQL Server 2005.

LIMIT 1? Is this a MySql question?

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

Mr. Flibble
Starting Member

7 Posts

Posted - 2008-09-03 : 11:47:04

It'll probably be implemented in MSSQL but maybe Postgres. I imagine that my/ms/pg are similar enough in this respect that it doesn't matter.

Is there any reason to index the data row if it is just being selected but never used in a where clause?

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources