Optimizing high volume inserts

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

SQL Server Development (2000)

Optimizing high volume inserts

Author

Topic

aiken
Aged Yak Warrior

525 Posts

Posted - 2003-09-13 : 12:56:21

Hi, everyone. I've got a web application that logs every pageview to the DB. Problem is, on peak days it's up to about 900,000 pageviews, and the sheer volume of logging is starting to cause problems (deadlocks mostly).

The log table is simple:

if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[log_running]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
drop table [dbo].[log_running]
GO

CREATE TABLE [dbo].[log_running] (
	[pagename] [varchar] (35) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
	[i_users] [int] NOT NULL ,
	[userlevel] [tinyint] NOT NULL ,
	[datetime] [smalldatetime] NOT NULL ,
	[site_id] [tinyint] NOT NULL ,
	[iServers] [int] NOT NULL ,
	[i] [int] IDENTITY (1, 1) PRIMARY KEY CLUSTERED ,
	[hour] [tinyint] NOT NULL ,
	[ip] [int] NULL 
) ON [WRITE_INTENSIVE_1]
GO

 CREATE  INDEX [log_running4] ON [dbo].[log_running]([iServers], [hour], [pagename], [datetime], [site_id]) ON [PRIMARY]
GO

And the log SP is:

CREATE PROCEDURE p_log_runningEx
    (@page varchar(40), @iUsers int, @userlevel int,@iServers int,@SiteID int=1,@vcIP varchar(15)=NULL)
AS
set nocount on

/* Add pageview to log_running */
insert into log_running WITH(FASTFIRSTROW,ROWLOCK) (pagename,i_users,userlevel,iServers,site_id,ip) 
	VALUES(@page, @iUsers, @userlevel,@iServers,@siteID,dbo.f_frnk_util_IPAddrStr2Int(@vcIP,0))

The [datetime] field as a default of getdate(), the [hour] field (which is actually called "hr", but snitz keeps wanting to replace that with a horizontal rule) is an index of the hour of the day that has a default of datepart(hour,datetime). We need that for some reports that are run frequently. And, as you can see, we're using a function to convert the string IP to an int.

My questions:

- Will we see better performance (shorter locks) by doing the IP conversion into a variable, and then using the variable in the insert? The function should be pretty fast, as it's just a bunch of math and string manipulation. Does SQL server parse the function before starting the insert, or does the function serve to hold up the insert?

- Likewise, should I put the hour into a varialbe in the SP and insert it, rather than relying on that column's default to do the datepart?

- And similarly, is there any performance benefit or penalty to letting the DB fill in the [datetime] field from the default? Should I just include datetime() in the insert, or, because it's a smalldatetime value in the table, should I declare a variable and set it to datetime() and then explicitly insert that?

- Right now, we're got the identity field, whose sole purpose is to be a primary key. Nothing references it, and all reporting is aggregate (how many pageviews on site X in hour Y, etc). At one point in the past, I got obsessed with primary keys and decided that every table should have one, and I think this is a relic of that. Does it serve any purpose?

- The table is on a RAID 1 array (the "write_intensive") filegroup. The identity/primary key is clustered. That should make writes sequential; is that a good or bad thing? Should I move the primary key to something that will avoid creating hotspots, or is it going to be happier this way?

Thanks in advance. I know the answer to a lot of these is "test it empirically, and see what works better." However, this is a production system and I just don't have the resources (hardware or time) to set up a dev system and test it at this kind of volume.

Cheers
-b

Stoad
Freaky Yak Linguist

1983 Posts

Posted - 2003-09-13 : 18:03:00

How about dropping both the PK and the index [log_running4]?
I guess the bosses of the site don't demand for reporting on
the site visiting statistics each 30 minutes.

aiken
Aged Yak Warrior

525 Posts

Posted - 2003-09-13 : 23:25:36

I can drop the PK, but there are automated systems checking on the hourly stats once an hour, and without that index those indexes, those queries really bog things down.

If I do drop the PK, should I make [datetime] clustered? Or just have a table without a clustered index?

Thanks
-b

Stoad
Freaky Yak Linguist

1983 Posts

Posted - 2003-09-14 : 04:41:34

Your table is a table-in-itself, 'a lonely island'. So I think it needs no any PK.
Plus, how that PK is involved in your retrieving stats query?

>Or just have a table without a clustered index?

I think it's a good idea.

PS WITH(FASTFIRSTROW,ROWLOCK)... it's above my head so far.

eyechart
Master Smack Fu Yak Hacker

3575 Posts

Posted - 2003-09-15 : 00:13:30

clustered index is good for inserts, but it is bad to put it on the identity seed. You will see hotspots on disk, especially with a RAID 1 configuration. A very active file is better on a RAID 10 (O+1 or whatever you want to call it). RAID 5 would be bad btw, because there is an enormous write penalty with RAID 5. That is unless you use an IBM or hitachi SAN, which get around this problem. Other SAN hardware may as well.

Also, don't put the function in the insert statment. Calculate the value to be inserted outside of the actual insert statement. That way, you don't spend any extra time with a lock in place on the table. Every millisecond counts with an active table like this.

-ec

aiken
Aged Yak Warrior

525 Posts

Posted - 2003-09-15 : 12:27:33

Thanks for the advice. The disk is RAID 1, but not RAID 10 (just not enough spindles to do that).

Because the table is only thing on this disk (even its nonclustered index is on a different array), and because it gets 99.99% inserts, isn't it a good thing to have the clustered index sequential, so the heads can just step along one sector at a time? Or would it be preferable to lose the PK and put a nonunique clustered index on something like [pagename], which is going to scatter the writes (and page splits) throughout the table?

I realize that the best answer to all of this is "test it in a development environment and measure the actual results," but I just don't have that luxury right now.

Thanks
-b

eyechart
Master Smack Fu Yak Hacker

3575 Posts

Posted - 2003-09-15 : 17:24:15

quote:
Originally posted by aiken

Thanks for the advice. The disk is RAID 1, but not RAID 10 (just not enough spindles to do that).

Because the table is only thing on this disk (even its nonclustered index is on a different array), and because it gets 99.99% inserts, isn't it a good thing to have the clustered index sequential, so the heads can just step along one sector at a time? Or would it be preferable to lose the PK and put a nonunique clustered index on something like [pagename], which is going to scatter the writes (and page splits) throughout the table?

I realize that the best answer to all of this is "test it in a development environment and measure the actual results," but I just don't have that luxury right now.

Thanks
-b

That is a good point. Since this disk is RAID 1 all the inserts will be going to it anyway. Regardless of the what key you have the clustered index on.

If it were RAID 10 or RAID 5, I would suggest clustering on a non-sequential key. In this case though, you might see some benefit with the index you have chosen.

-ec

TSQLMan
Posting Yak Master

160 Posts

Posted - 2003-09-15 : 22:20:35

Good point, definitely keep the functions out of the insert.

quote:
Originally posted by eyechart

Also, don't put the function in the insert statment. Calculate the value to be inserted outside of the actual insert statement. That way, you don't spend any extra time with a lock in place on the table. Every millisecond counts with an active table like this.

-ec

aiken
Aged Yak Warrior

525 Posts

Posted - 2003-09-16 : 14:00:51

At this point, I've killed the PK, deleted the identity column, and moved the clustered index to the [datetime],[site_id] fields. I've moved the computations out of the insert,

It definitely made a night-and-day difference in performance. While about 1% of inserts take longer than 50ms, of those the average is about 70ms. Previously, it was about 1% as well, but the average for long-running inserts was about 250ms. Overall, the DB is doing less work, which makes the rest of the app happier as well.

Thanks for the help!
-b

eyechart
Master Smack Fu Yak Hacker

3575 Posts

Posted - 2003-09-16 : 17:46:28

quote:
Originally posted by aiken

At this point, I've killed the PK, deleted the identity column, and moved the clustered index to the [datetime],[site_id] fields. I've moved the computations out of the insert,

It definitely made a night-and-day difference in performance. While about 1% of inserts take longer than 50ms, of those the average is about 70ms. Previously, it was about 1% as well, but the average for long-running inserts was about 250ms. Overall, the DB is doing less work, which makes the rest of the app happier as well.

another thing to consider is altering your cache settings on the RAID controller to favor writes over reads. With compaq hardware you typically can set it up anywhere from 100% Read/0% Write to 0% Read/100% Writes in 25% increments. Most vendors have something similar to this that you can tweak for even more performance. Just make sure you have battery backed up write cache.

-ec

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources