Indexing Question - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

General SQL Server Forums

Database Design and Application Architecture

Indexing Question

Author

Topic

temp100
Starting Member

3 Posts

Posted - 2013-12-04 : 16:35:11

I have a SQL 2008 table that heavily used for event scheduling. The fields are:
[EventSubscribers]
EventCode Varchar(10)
ParticipantID Varchar(10)
EventDate SmallDateTime
TimeSlot SmallInt
TransID BigInt (identity)
... (6 other fields)

For each EventCode, there could be tens of thousands of records qualified by the ParticipantID,EventDate, and TimeSlot

The critical query I'm tuning for is:

Select EventCode,... From EventDetail Where ParticipantID=? And EventDate=? And TimeSlot=?

I defined a clustered index on EventDate and a non-clustered unique index on ParticipantID, EventDate, TimeSlot
A friend recommended either dropping the clustered index completely or creating the clustered index on ParticipantID, EventDate, TimeSlot
Which approach would afford the best insert performance?
Traffic is extremely high when new events are published.

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2013-12-04 : 17:35:25

Some thoughts:

Unfortunately, ideal insert performance and ideal performance on that particular select may be two different things. For quick inserts either no clustered index (just a heap) is a possibility as new rows are just thrown onto the heap. Or a clustered index on the identity column so new rows always go to the end of the table.

Having a clustered index on the EventDate would be good only if you always (or often) searched by an EventDate range.

Looks like your critical query is just returning one row. In that case I would just leave that non-clustered index (participantid, eventDate, TimeSlot). Is that critical query performing well?

Be One with the Optimizer
TG

temp100
Starting Member

3 Posts

Posted - 2013-12-04 : 19:38:39

The query is performing well, but to clarify, the insert is more of a concern because when an event is published, the system is hammered by users selecting the same eventdate/timeslot. There's another table that maintains the maximum number of participants for each event as well as the current count of participants committed. So first the query is executed to ensure the participant isn't already committed to a selected timeslot, then a lock is issued while the master count is updated and the participant record is inserted.

The timeslots represent a half-hour and events can last days or even weeks. So for each submission, there's a loop that checks timeslot by timeslot, inserting a record each time.

I'm starting to think that removing the clustered index may be the better option. Would it be advantageous to change the order of the non-clustered index to EventDate,ParticipantID,TimeSlot?

Also, would I gain much if I redefined the keys as integers rather than varchar(10)? It would be a lot of work changing the back end programs, which is why they've remained.

Kristen
Test

22859 Posts

Posted - 2013-12-05 : 04:53:30

quote:
Originally posted by temp100

So first the query is executed to ensure the participant isn't already committed to a selected timeslot, then a lock is issued while the master count is updated and the participant record is inserted.

When someone books there is just one insert in one table right? (The Event table doesn't have a parallel or sub table or anything like that?)

Currently you are first checking that the participant doesn't already have a slot booked etc, obtain a lock, etc.?

If I have got that right I think that is a weak point, I would do:


INSERT INTO MyEventTable
(
EventCode, ParticipantID, TimeSlot, ...
)
SELECT 
	@EventCode AS EventCode,
	@ParticipantID AS ParticipantID,
	@TimeSlot AS TimeSlot,
	...
WHERE NOT EXISTS
(
	SELECT	*
	FROM	MyEventTable AS E
	WHERE	    E.ParticipantID = @ParticipantID
		AND E.EventCode = @EventCode
		...
)
SELECT @intRowCount = @@ROWCOUNT

IF @intRowCount = 0
BEGIN
	... [i]Participant is already booked[i] ...
END
ELSE
BEGIN
	... [i]Participant booking successful[i] ...
END

quote:
The timeslots represent a half-hour and events can last days or even weeks. So for each submission, there's a loop that checks timeslot by timeslot, inserting a record each time.

You need to avoid this. The loop is a killer, your inserts need to be set-based - insert everything in one statement.

If a Participant books for several timeslots then you need the EXIST test to match on any/all of those, and do a single multi-row insert for all the timeslots the Participant is booking. So JOIN on a table of "slots" that includes all those (a table of numbers will probably do instead - ask if you need help with that)


INSERT ...
SELECT 
	...
	S.TimeSlot AS TimeSlot,

	...
WHERE NOT EXISTS
(
	SELECT	*
	FROM	MyEventTable AS E
		JOIN MySlotsTable AS S
			 ON S.TimeSlot >= @TimeSlot_START
			AND S.TimeSlot <= @TimeSlot_END
	WHERE	    E.ParticipantID = @ParticipantID
		AND E.EventCode = @EventCode
		AND E.TimeSlot = S.TimeSlot
		...
)

EDIT: See caveat below

Thus no additional locking etc. required.

If you DO need inserts to two/more tables then put them all in a BEGIN TRANSACTION ... COMMIT / ROLLBACK block, and keep the code inside the block really REALLY tight.

quote:
I'm starting to think that removing the clustered index may be the better option.

NOT having a clustered index brings it only problems - e.g. when tying to de-fragment the table. I'd prefer to have one, and I think my SQL statement above would do inserts just fine. But if not I would have the Clustered Index on IDENTITY, as TG suggested. If you do have a Clustered Index on data columns then make sure it has enough fields to be unique. Beware that all other indexes will include ALL the columns in the Clustered Index, so lots of columns in the clustered index can be a killer (IDENTITY is good in that regard of course). What is important is that the clustered index key fields have the most commonly used range value first. So if you ask for "All/Some Events/Participants/etc ON THIS DATE" then put EventDate first, if the most common query is "All/Some Participants/xxx FOR THIS EVENT" then put EventCode first.

quote:
Would it be advantageous to change the order of the non-clustered index to EventDate,ParticipantID,TimeSlot?

You might need several indexes (with the keys arranged in different orders) ... you need to check the query plan of your common queries and see if any of them use an Index Scan, or worse a Table Scan, and then try adding an index to see if it optimises those.

A query with a WHERE clause that has EventDate, ParticipantID and TimeSlot will probably make use if any index that includes all three fields, but SQL is more likely to use an index where the first key field is the most selective. EventDate is probably the worst [LOTS of rows with the same EventDate], and assuming ParticipantID doesn't come to loads of events then that is probably the best.

quote:
would I gain much if I redefined the keys as integers rather than varchar(10)

I reckon

INT is 4 bytes, you would get 10 / 4 = 2.5 times as many keys in each index page

quote:
It would be a lot of work changing the back end programs, which is why they've remained.

Add a Calculated column to the table as CONVERT(int, MyVarcharColumn) and then index on that? No changes to program required - although you might want to add a CHECK constraint on the column to enforce that it is indeed numeric!

Having said that, I have had trouble with side effects of Computed Columns and we tend to avoid them now. My preference is to have a VIEW and then put an index on that ... but unless you have the Enterprise Version I don't think the Index on the View will be used for a query on the Table so you would have to include the View in your query to make use of the index - maybe that is not a big deal?

The Enterprise Version is smart enough to reuse the View's Index on queries on the Table - which I think is pretty cool!

Kristen
Test

22859 Posts

Posted - 2013-12-05 : 04:57:14

Caveat:

The @@ROWCOUNT will be the number of Slots that were inserted. If the Participant had already booked SOME then this will insert the REST. Dunno if that is acceptable? If not (i.e. you want to say "Your booking overlaps an existing one") then you need to check if the Row Count is the same as the Count of Slots required, and if less then rollback the INSERT (and say "Sorry mate" to the Participant.

temp100
Starting Member

3 Posts

Posted - 2013-12-07 : 08:51:16

The participants freely select 30 minute timeslots, so there could be gaps in a range. They also often select different events within the same period. Most of the participants attend events at least every month, some even daily. The participant history goes back many years. There is a another table that maintains the slots available and slots selected.

Other than the identity column, the combination of ParticipantID,EventDate, TimeSlot would be required to create a unique index. But from what I gather, not an efficient key for a clustered index. So, if I create a clustered index on TransID, a key that's never used in queries, I would gain performance because the records would be ordered essentially by the insert date, correct? I suppose it correlates somewhat to the event date. If I then create a non-clustered index, which field ordering would be better:

ParticipantID, EventDate, TimeSlot
or
EventDate, ParticipantID, TimeSlot

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources