Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
Mr. Flibble
Starting Member
7 Posts |
Posted - 2008-09-03 : 07:20:37
|
I've a question or two about performance issues with a large flat table.It looks something like this:id | col2 | col3 | col4 | dataid is unique. col2/3/4 will be used as select constraints.The only select query that will be run is SELECT data FROM table WHERE col2=a AND col3=b AND col4=c.1) Is there any reason not to index col2/3/4?2) When inserting I will have a lot of duplicate data, based on the id, which I don't want inserted (and can't be because id is constrained as unique). Is the fastest way to insert a load of data, which may contain unwanted duplicates, to just go and try and insert all the data and let it reject data which has the same id?Or would it be better to first check if the row exists and then attempt to insert it? ie.SELECT COUNT(*) FROM table WHERE id = x[if result = 0]INSERT ...[else]move on to next.Typically I will have say, 100k rows to insert, 30k of which are unwanted duplicates.3) The table will be in the order of 60million+ rows. col2 will only have maybe 5 different values. Provided it is indexed, will it be just as fast to leave all the data in that single table over creating 5 different tables and choosing the correct table programmatically before running a query? Basically, even if the table were 200mill rows, will SELECT (*) FROM table WHERE col2 = x LIMIT 1 take a miniscule amount of time, if there are only 5 values of col2? |
|
tkizer
Almighty SQL Goddess
38200 Posts |
|
Mr. Flibble
Starting Member
7 Posts |
Posted - 2008-09-03 : 11:47:04
|
It'll probably be implemented in MSSQL but maybe Postgres. I imagine that my/ms/pg are similar enough in this respect that it doesn't matter.Is there any reason to index the data row if it is just being selected but never used in a where clause? |
|
|
|
|
|
|
|