Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
bigsapper
Starting Member
8 Posts |
Posted - 2008-04-22 : 09:08:21
|
I have an opportunity to rebuild a database model with the express purpose of improving query performance. So given the following I have a few questions.Table A (~500M records)Primary Key Field (int)Field 1 (varchar)Field 2 (varchar)Field 3 (varchar)Field 4 (varchar)Field 5 (varchar)Table B (1B+ records)Primary Key Field (int)Foreign Key Field (int)Field 1 (varchar)Field 2 (varchar)Field 3 (varchar)Field 4 (varchar)Field 5 (varchar)* Assumed: Tables are inner joined on all queries. The database is readonly.-- Most of my lookups are based on querying Field 1 of Table A. The data content of Field 1, Table A is 90% unique.1) Would it be more beneficial to put the clustered index on Field 1 instead of the PK field in Table A?2) Can an Identity column be non-clustered?3) Alternatively, would it be beneficial to build a separate lookup table with just the PK & Field 1 of Table A, with a clustered index on the lookup table Field 1 which I join on Table A? (did that make sense?)-- I have a secondary lookup that performs queries on Fields 1, 2, 3, 4 & 5 of Table B1) Would it be more beneficial to create an additional indexed lookup column of the concantenated values of Fields 1-5 of Table B versus a covering index of all 5 columns?2) Does a clustered index have to be unique?3) Would a clustered index be more beneficial over Fields 1-5 or the special lookup column versus the PK or FK fields?4) Would creating a special lookup table with just the requisite fields be more beneficial?An extra question. The existing data model uses the CHAR datatype for all columns less than 9 characters wide and the columns are set to allow nulls. This requires every select statement to COALESCE() and RTRIM() all these columns. I intend to make all (affected) columns VARCHAR, NOT NULL with a default value of a 0-length string. Will this enhance query performance?Thanks in advance for any insight. |
|
spirit1
Cybernetic Yak Master
11752 Posts |
Posted - 2008-04-22 : 11:57:54
|
1. probably. you'll have to test this.2. yes3. not really. that would be basicaly the same as putting the clustered index on the identity PK and Field 11. don't think so. maybe have a checksum of the 5 and index that? depends on how large the fields are.2. yes. if it's not, the sql serevr adds an internal uniqueifier that makes it unique. this doesn't change the acctuall data in the db.3. no sure what you mean here. CI on all 5 would be great. you'll always have to query the whole 5 columns to use it though.4. don't think so.about those look up tables: maybe you should give us an example.1. yes. using a function on the column doesn't use an index on that column._______________________________________________Causing trouble since 1980blog: http://weblogs.sqlteam.com/mladenpSSMS Add-in that does a few things: www.ssmstoolspack.com |
|
|
bigsapper
Starting Member
8 Posts |
Posted - 2008-04-22 : 12:15:12
|
quote: Originally posted by spirit1 1. probably. you'll have to test this. OK2. yes OK3. not really. that would be basicaly the same as putting the clustered index on the identity PK and Field 1 OK1. don't think so. maybe have a checksum of the 5 and index that? depends on how large the fields are. Is there a threshold value I should be looking at?2. yes. if it's not, the sql serevr adds an internal uniqueifier that makes it unique. this doesn't change the acctuall data in the db. So you're saying, I could put a clustered index on non-unique data and sql server will deal with the uniqueness itself? Is that a perf hit?3. no sure what you mean here. CI on all 5 would be great. you'll always have to query the whole 5 columns to use it though. Answered in #1, thanks.4. don't think so. OKabout those look up tables: maybe you should give us an example. see below1. yes. using a function on the column doesn't use an index on that column._______________________________________________Causing trouble since 1980blog: http://weblogs.sqlteam.com/mladenpSSMS Add-in that does a few things: www.ssmstoolspack.com
Consider Table A, People and Table B, Addresses. The lookup table would be a de-normalized build of the join of People & Addresses with the following columns...(Table A Primary Key Field)FirstNameLastNameHouseNumberStreetNamePostalCodeNow consider a third table, AliasNames, child of People. The lookup table would now comprise all occurences of "original name" and associated adrresses as well as each "alias name" and associated addresses. |
|
|
spirit1
Cybernetic Yak Master
11752 Posts |
Posted - 2008-04-22 : 12:36:43
|
1. the checksum was a suggestion to lower index size. if all you have are ints or small varchar don't bother. you might test this for fun though.2. yes you can. no it's not a perf hit, since the uniqifier is only added to the duplicate rows.3. well denormalized data is faster to look up... but there's no definite answer here. you'll have to test it. if you're doing singleton lookups on the CI the size of the tables doesn't really matter that much.however do keep in mind that theory (which is what you're getting here since we don't have access to your system) is great but you'll have to test your app thoroughly._______________________________________________Causing trouble since 1980blog: http://weblogs.sqlteam.com/mladenpSSMS Add-in that does a few things: www.ssmstoolspack.com |
|
|
bigsapper
Starting Member
8 Posts |
Posted - 2008-04-22 : 13:41:32
|
Thanks. I appreciate the insight. |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2008-04-22 : 14:45:18
|
Maybe what Spirit says is that when a table does not have a clustered index (a heap),then nonclustered indexes use a row locator built from the1) file identifier2) page number3) number of the row on the pageto point to the date, and it is known as row id (rid). This row id is not retrievable.If the table has a clustered index, then nonclustered indexes use the key of the clustered index to locate the data. E 12°55'05.25"N 56°04'39.16" |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
|
spirit1
Cybernetic Yak Master
11752 Posts |
Posted - 2008-04-22 : 15:15:34
|
> Maybe what Spirit says is that when a table does not have a clustered index (a heap),> then nonclustered indexes use a row locator built from thefor which point do you think i'm saying that?_______________________________________________Causing trouble since 1980blog: http://weblogs.sqlteam.com/mladenpSSMS Add-in that does a few things: www.ssmstoolspack.com |
|
|
|
|
|
|
|