Help with Index - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

SQL Server Development (2000)

Help with Index

Author

Topic

Nato
Starting Member

30 Posts

Posted - 2003-08-01 : 10:04:38

I am new to indexing, so I am having a problem making a effective index for this query:

SELECT Datepart(year,Date),Web_Product, SUM(Web_Product) FROM Stats_Web_Product 
WHERE Web_Product <> '0'
AND IP_Address <> '62.177.158.114' 
AND Date >= '05/01/2003' 
AND Date <= '07/31/2003' 
AND Web_Platform = '2'
GROUP BY Datepart(year,Date), Web_Product
ORDER BY Web_Product

Here is the table:

CREATE TABLE [dbo].[Stats_Web_Product] (
	[ID] [int] IDENTITY (1, 1) NOT NULL ,
	[Web_Language] [int] NOT NULL ,
	[IP_Address] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
	[Date] [datetime] NOT NULL ,
	[Time] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
	[Web_Platform] [int] NOT NULL ,
	[Web_Product] [int] NULL ,
	[Web_Category] [int] NULL ,
	[Web_Demo] [int] NULL ,
	[Web_Sellingpage] [int] NULL ,
	[Extra_Info_Page] [int] NULL 
) ON [PRIMARY]
GO

Any Ideas?

X002548
Not Just a Number

15586 Posts

Posted - 2003-08-01 : 10:18:44

Do you have a primary key?

Is this the way the data is most likely gotten at?

What's the cardinality...oh forget that..

What's with the IDENTITY Column? Can you not uniquely define the data?

How many of rows do you have/expect to have?

What color is your database?

It seems on flush that


 CREATE  INDEX IX1_Stats_Web_Product 
	    ON Stats_Web_Product (Web_Product, Web_Platfrom, [Date], IP_address) 
WITH  FILLFACTOR = 90 ON [PRIMARY]
GO

Would be my best guess....

Brett

8-)

SELECT POST=NewId()

Amethystium
Aged Yak Warrior

701 Posts

Posted - 2003-08-01 : 10:21:18

Clustered Indexes are you friends in this case. I suggest your create it on your identity column.

----------------
Shadow to Light

X002548
Not Just a Number

15586 Posts

Posted - 2003-08-01 : 10:28:33

quote:
Originally posted by Amethystium

Clustered Indexes are you friends in this case. I suggest your create it on your identity column.

----------------
Shadow to Light

WHAT?

Brett

8-)

SELECT POST=NewId()

Amethystium
Aged Yak Warrior

701 Posts

Posted - 2003-08-01 : 10:31:44

quote:
Originally posted by X002548

quote:
Originally posted by Amethystium

Clustered Indexes are you friends in this case. I suggest your create it on your identity column.

----------------
Shadow to Light

WHAT?

Brett

8-)

SELECT POST=NewId()

What is my crime Brett?

----------------
Shadow to Light

Nato
Starting Member

30 Posts

Posted - 2003-08-01 : 10:39:54

Amethystium, Not sure of the lingo here. What is my

"identity column"

Also, Brett. Your index slowed my query, to answer your questions:

Do you have a primary key?
Yes, it is ID

Is this the way the data is most likely gotten at?
Yes.

How many of rows do you have/expect to have?
There is over 4 million rows in the Table, and counting

What color is your database?
Whatever color you want it to be

Thanks

Philip

Amethystium
Aged Yak Warrior

701 Posts

Posted - 2003-08-01 : 10:49:58

You asked what would be the most suitable type of index for your query and I suggested a clustered index on your identity column.

Clustered indexes are best added to columns which monotonically increases, i.e. your ~~identity column~~ ID column and queries that make use of GROUP BY and use MAX, MIN, SUM + any other aggregate functions.

----------------
Shadow to Light

Nato
Starting Member

30 Posts

Posted - 2003-08-01 : 10:55:19

Amethystium, thanks for that.

I am creating an Index using SQL Analyser, but the option to make a "Clustered Index" is greyed out. The tick box, cannot be ticked, comprende?

Any idea why this is happening?

X002548
Not Just a Number

15586 Posts

Posted - 2003-08-01 : 11:01:32

What indexes do you have already?

Which column has the most unique data in it...

Try placeing [Date] (not a good column name btw) first...

A clustered index on an identity column I don't think is a good idea

It'll create a bottleneck I believe

Brett

8-)

SELECT POST=NewId()

Nato
Starting Member

30 Posts

Posted - 2003-08-01 : 11:08:19

I only have the standard Index that is created when you have a primary key.

If ID is the indentity column, then there is always a clustered index on this. MSSQL does this automatically.

What is wrong with Date as a column name?

X002548
Not Just a Number

15586 Posts

Posted - 2003-08-01 : 11:14:56

Date is a reserved word..

A clustered index will sort the data pages on that column...

If you went after the data by the Identity column it would be fast, but how often will you do that?

If you're saying that the index you had was faster, I'd be suspiscious..if you reun and rerun and rerun the results will be stored in cache and would appear to be faster..

I'm guessing with the new index the optimizer had to redo the work..

What's does show plan tell you?

An clustered on the Identity column would scan the table..

Brett

8-)

SELECT POST=NewId()

Amethystium
Aged Yak Warrior

701 Posts

Posted - 2003-08-01 : 11:17:44

quote:
Originally posted by Nato

I only have the standard Index that is created when you have a primary key.

If ID is the indentity column, then there is always a clustered index on this. MSSQL does this automatically.

I've been waiting for you to say this! You know the answer for yourself now.

quote:

What is wrong with Date as a column name?

I think it is best to stay well clear from key words like DATE, COUNT etc... but you do have the flexibility of using them if you engulf them in square brackets. Perhaps a new name such as myDate or EffectiveDate would be more suitable but the choice is yours

----------------
Shadow to Light

X002548
Not Just a Number

15586 Posts

Posted - 2003-08-01 : 15:19:30

quote:
Originally posted by Amethystium
What is my crime Brett?

What do you think his normal access path to the data will be?

If, for example (and there are a lot of rows) s/he had a lot of inserts...on what page(s) (The s is a trick question) do you think all the inserts will hit?

Brett

8-)

SELECT POST=NewId()

EDIT: Go check out the Yal Corral

robvolk
Most Valuable Yak

15732 Posts

Posted - 2003-08-01 : 20:14:40

quote:
Originally posted by Nato
If ID is the indentity column, then there is always a clustered index on this. MSSQL does this automatically.

Actually it does not. Even when using Enterprise Manager it will not automatically choose anything for a primary key.

Using ID as your primary key is not particularly helpful, but we'll skip that for now, I've gotten enough grief about that already.

But if you cannot use anything else as a primary key, at least make it nonclustered, and create your clustered index on the Date column, or Date and Web_Platform, or Date, Web_platform and Web_Product. Something like this:

CREATE CLUSTERED INDEX IX_CLUST_Stats_Web_Product ON Stats_Web_Product([Date], Web_Platform, Web_Product)

You may need to ALTER TABLE...DROP CONSTRAINT on the primary key in order to add the clustered index. Or modify your CREATE TABLE to:

CREATE TABLE [dbo].[Stats_Web_Product] (
[ID] [int] IDENTITY (1, 1) NOT NULL PRIMARY KEY NONCLUSTERED,
[Web_Language] [int] NOT NULL ,
[IP_Address] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[Date] [datetime] NOT NULL ,
[Time] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
[Web_Platform] [int] NOT NULL ,
[Web_Product] [int] NULL ,
[Web_Category] [int] NULL ,
[Web_Demo] [int] NULL ,
[Web_Sellingpage] [int] NULL ,
[Extra_Info_Page] [int] NULL
) ON [PRIMARY]

By putting the Date column first, the query can take advantage of the date range specified in the WHERE clause. Since that is most likely the most selective of the criteria (most unique values) it can seek those rows very quickly and ignore the rest of the table.

Also, Web_Platform and Web_Product are int columns, you should not put single quotes around them in the query:

SELECT Datepart(year,Date),Web_Product, SUM(Web_Product) FROM Stats_Web_Product
WHERE Web_Product <> 0
AND IP_Address <> '62.177.158.114'
AND Date >= '05/01/2003'
AND Date <= '07/31/2003'
AND Web_Platform = 2
GROUP BY Datepart(year,Date), Web_Product
ORDER BY Web_Product

It will only cause an unnecessary implicit conversion to be performed.

I also agree with everyone's suggestion to use another name instead of Date for that column. Reserved words will always bite you in the ass when used as object names.

I recommend testing the performance using the clustered index I described, and possibly try modifying the order of the columns and testing the results again. I have a feeling that putting Date first in the list will give you the best performance, but if you can test other variations you'll know for sure which one is best.

Nato
Starting Member

30 Posts

Posted - 2003-08-04 : 11:11:12

robvolk, thanks for your help. This looks to have helped me out big time. Also, you explained your suggestions very clearly, which is most appreciated.

I have one problem, which is small but getting around it is a hassle. My table, that has 5 million + rows, has ID set as a clusted index. I cannot remove this, because it is using ID as a primary key etc...

Is there anyway around this, or will have to create a new table, and then transfer the data?

Thanks

Philip

robvolk
Most Valuable Yak

15732 Posts

Posted - 2003-08-04 : 19:15:48

Run these in the following sequence:

ALTER TABLE Stats_Web_Product DROP CONSTRAINT PK_whatever_the_name_is

CREATE CLUSTERED INDEX IX_CLUST_Stats_Web_Product ON Stats_Web_Product([Date], Web_Platform, Web_Product)

ALTER TABLE Stats_Web_Product ADD CONSTRAINT PK_Stats_Web_Product PRIMARY KEY NONCLUSTERED(ID)

You'll have to check sp_help or sp_helpconstraint to get the actual name of the primary key constraint so you can drop it.

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources