| Author |
Topic |
|
Nato
Starting Member
30 Posts |
Posted - 2003-08-01 : 10:04:38
|
I am new to indexing, so I am having a problem making a effective index for this query:SELECT Datepart(year,Date),Web_Product, SUM(Web_Product) FROM Stats_Web_Product WHERE Web_Product <> '0'AND IP_Address <> '62.177.158.114' AND Date >= '05/01/2003' AND Date <= '07/31/2003' AND Web_Platform = '2'GROUP BY Datepart(year,Date), Web_ProductORDER BY Web_Product Here is the table:CREATE TABLE [dbo].[Stats_Web_Product] ( [ID] [int] IDENTITY (1, 1) NOT NULL , [Web_Language] [int] NOT NULL , [IP_Address] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL , [Date] [datetime] NOT NULL , [Time] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL , [Web_Platform] [int] NOT NULL , [Web_Product] [int] NULL , [Web_Category] [int] NULL , [Web_Demo] [int] NULL , [Web_Sellingpage] [int] NULL , [Extra_Info_Page] [int] NULL ) ON [PRIMARY]GO Any Ideas? |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2003-08-01 : 10:18:44
|
Do you have a primary key?Is this the way the data is most likely gotten at?What's the cardinality...oh forget that..What's with the IDENTITY Column? Can you not uniquely define the data?How many of rows do you have/expect to have?What color is your database? It seems on flush that CREATE INDEX IX1_Stats_Web_Product ON Stats_Web_Product (Web_Product, Web_Platfrom, [Date], IP_address) WITH FILLFACTOR = 90 ON [PRIMARY]GO Would be my best guess....Brett8-)SELECT POST=NewId() |
 |
|
|
Amethystium
Aged Yak Warrior
701 Posts |
Posted - 2003-08-01 : 10:21:18
|
| Clustered Indexes are you friends in this case. I suggest your create it on your identity column.----------------Shadow to Light |
 |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2003-08-01 : 10:28:33
|
quote: Originally posted by Amethystium Clustered Indexes are you friends in this case. I suggest your create it on your identity column.----------------Shadow to Light
WHAT?Brett8-)SELECT POST=NewId() |
 |
|
|
Amethystium
Aged Yak Warrior
701 Posts |
Posted - 2003-08-01 : 10:31:44
|
quote: Originally posted by X002548
quote: Originally posted by Amethystium Clustered Indexes are you friends in this case. I suggest your create it on your identity column.----------------Shadow to Light
WHAT?Brett8-)SELECT POST=NewId()
What is my crime Brett?----------------Shadow to Light |
 |
|
|
Nato
Starting Member
30 Posts |
Posted - 2003-08-01 : 10:39:54
|
Amethystium, Not sure of the lingo here. What is my "identity column"Also, Brett. Your index slowed my query, to answer your questions:Do you have a primary key?Yes, it is IDIs this the way the data is most likely gotten at?Yes.How many of rows do you have/expect to have?There is over 4 million rows in the Table, and countingWhat color is your database? Whatever color you want it to be ThanksPhilip |
 |
|
|
Amethystium
Aged Yak Warrior
701 Posts |
Posted - 2003-08-01 : 10:49:58
|
You asked what would be the most suitable type of index for your query and I suggested a clustered index on your identity column.Clustered indexes are best added to columns which monotonically increases, i.e. your identity column ID column and queries that make use of GROUP BY and use MAX, MIN, SUM + any other aggregate functions.----------------Shadow to Light |
 |
|
|
Nato
Starting Member
30 Posts |
Posted - 2003-08-01 : 10:55:19
|
| Amethystium, thanks for that.I am creating an Index using SQL Analyser, but the option to make a "Clustered Index" is greyed out. The tick box, cannot be ticked, comprende?Any idea why this is happening? |
 |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2003-08-01 : 11:01:32
|
| What indexes do you have already?Which column has the most unique data in it...Try placeing [Date] (not a good column name btw) first...A clustered index on an identity column I don't think is a good ideaIt'll create a bottleneck I believeBrett8-)SELECT POST=NewId() |
 |
|
|
Nato
Starting Member
30 Posts |
Posted - 2003-08-01 : 11:08:19
|
| I only have the standard Index that is created when you have a primary key.If ID is the indentity column, then there is always a clustered index on this. MSSQL does this automatically.What is wrong with Date as a column name? |
 |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2003-08-01 : 11:14:56
|
| Date is a reserved word..A clustered index will sort the data pages on that column...If you went after the data by the Identity column it would be fast, but how often will you do that?If you're saying that the index you had was faster, I'd be suspiscious..if you reun and rerun and rerun the results will be stored in cache and would appear to be faster..I'm guessing with the new index the optimizer had to redo the work..What's does show plan tell you?An clustered on the Identity column would scan the table..Brett8-)SELECT POST=NewId() |
 |
|
|
Amethystium
Aged Yak Warrior
701 Posts |
Posted - 2003-08-01 : 11:17:44
|
quote: Originally posted by Nato I only have the standard Index that is created when you have a primary key.If ID is the indentity column, then there is always a clustered index on this. MSSQL does this automatically.
I've been waiting for you to say this! You know the answer for yourself now.quote: What is wrong with Date as a column name?
I think it is best to stay well clear from key words like DATE, COUNT etc... but you do have the flexibility of using them if you engulf them in square brackets. Perhaps a new name such as myDate or EffectiveDate would be more suitable but the choice is yours ----------------Shadow to Light |
 |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2003-08-01 : 15:19:30
|
quote: Originally posted by AmethystiumWhat is my crime Brett?
What do you think his normal access path to the data will be?If, for example (and there are a lot of rows) s/he had a lot of inserts...on what page(s) (The s is a trick question) do you think all the inserts will hit?Brett8-)SELECT POST=NewId()EDIT: Go check out the Yal Corral |
 |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2003-08-01 : 20:14:40
|
quote: Originally posted by NatoIf ID is the indentity column, then there is always a clustered index on this. MSSQL does this automatically.
Actually it does not. Even when using Enterprise Manager it will not automatically choose anything for a primary key.Using ID as your primary key is not particularly helpful, but we'll skip that for now, I've gotten enough grief about that already. But if you cannot use anything else as a primary key, at least make it nonclustered, and create your clustered index on the Date column, or Date and Web_Platform, or Date, Web_platform and Web_Product. Something like this:CREATE CLUSTERED INDEX IX_CLUST_Stats_Web_Product ON Stats_Web_Product([Date], Web_Platform, Web_Product)You may need to ALTER TABLE...DROP CONSTRAINT on the primary key in order to add the clustered index. Or modify your CREATE TABLE to:CREATE TABLE [dbo].[Stats_Web_Product] ( [ID] [int] IDENTITY (1, 1) NOT NULL PRIMARY KEY NONCLUSTERED, [Web_Language] [int] NOT NULL , [IP_Address] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL , [Date] [datetime] NOT NULL , [Time] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL , [Web_Platform] [int] NOT NULL , [Web_Product] [int] NULL , [Web_Category] [int] NULL , [Web_Demo] [int] NULL , [Web_Sellingpage] [int] NULL , [Extra_Info_Page] [int] NULL ) ON [PRIMARY]By putting the Date column first, the query can take advantage of the date range specified in the WHERE clause. Since that is most likely the most selective of the criteria (most unique values) it can seek those rows very quickly and ignore the rest of the table.Also, Web_Platform and Web_Product are int columns, you should not put single quotes around them in the query:SELECT Datepart(year,Date),Web_Product, SUM(Web_Product) FROM Stats_Web_Product WHERE Web_Product <> 0AND IP_Address <> '62.177.158.114' AND Date >= '05/01/2003' AND Date <= '07/31/2003' AND Web_Platform = 2GROUP BY Datepart(year,Date), Web_ProductORDER BY Web_ProductIt will only cause an unnecessary implicit conversion to be performed.I also agree with everyone's suggestion to use another name instead of Date for that column. Reserved words will always bite you in the ass when used as object names.I recommend testing the performance using the clustered index I described, and possibly try modifying the order of the columns and testing the results again. I have a feeling that putting Date first in the list will give you the best performance, but if you can test other variations you'll know for sure which one is best. |
 |
|
|
Nato
Starting Member
30 Posts |
Posted - 2003-08-04 : 11:11:12
|
| robvolk, thanks for your help. This looks to have helped me out big time. Also, you explained your suggestions very clearly, which is most appreciated.I have one problem, which is small but getting around it is a hassle. My table, that has 5 million + rows, has ID set as a clusted index. I cannot remove this, because it is using ID as a primary key etc...Is there anyway around this, or will have to create a new table, and then transfer the data?ThanksPhilip |
 |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2003-08-04 : 19:15:48
|
| Run these in the following sequence:ALTER TABLE Stats_Web_Product DROP CONSTRAINT PK_whatever_the_name_isCREATE CLUSTERED INDEX IX_CLUST_Stats_Web_Product ON Stats_Web_Product([Date], Web_Platform, Web_Product)ALTER TABLE Stats_Web_Product ADD CONSTRAINT PK_Stats_Web_Product PRIMARY KEY NONCLUSTERED(ID)You'll have to check sp_help or sp_helpconstraint to get the actual name of the primary key constraint so you can drop it. |
 |
|
|
|