Author |
Topic |
jezemine
Master Smack Fu Yak Hacker
2886 Posts |
Posted - 2012-02-10 : 18:16:17
|
Hi experts! :)Recently one of the tables in our system got a column added, and since then the performance of a particular proc has been poor. The symptom is, every 100th call to this proc it seems to take forever to come back. Most calls return data fast, like sub-second. but every so often it takes several minutes for the proc to return.The table grows kind of fast, about 300k rows/min or so at peak hours. It grows via files that are being bulk inserted all the time. The data in the table is very time-sensitive. Only the most recent data is of any use to anyone, the older stuff is deleted in batches to keep the disk from filling up. Every 10 mins or so there is a job that runs that deletes older rows. At any given time, there is 30 mins worth of data in the table, so approx 10 million rows.Before the binary(9) column was added, everything was fine. that is, callers never timed out, response time was always fine.After adding a binary(9) null column, we started seeing these intermittent long duration calls. something like 1 out of 100 calls to this proc is timing out. the timeouts have been well correlated with when the column was added.So my question is, is something bad about choosing binary(9) for the column size? should we have chosen binary(16) instead? I have never heard of worrying about byte alignment for columns in tables. Is that something one should worry about? Here's the DDL. changed the column names to satisfy my paranoid boss. he doesn't want me to post the proc. If enough people say they need to see it though, I may be able to convince. :)create table foo( dtk int not null ,id int not null ,aaa tinyint not null ,bbb tinyint not null ,ccc smallint null ,ddd tinyint not null ,eee tinyint null ,fff tinyint null ,ggg tinyint null ,hhh tinyint not null ,iii tinyint null ,jjj tinyint null ,mmm tinyint not null ,nnn tinyint null ,ooo smallint null ,ppp tinyint not null ,qqq tinyint null ,rrr smallint null ,sss tinyint not null ,ttt int null ,uuu int null ,vvv int null ,www tinyint null ,xxx tinyint null ,yyy smallint null ,zzz binary(9) null ---------- added this column. shoudl we have made it binary(16) instead? ,constraint pk_foo primary key (dtk asc, id asc) ) elsasoft.org |
|
tkizer
Almighty SQL Goddess
38200 Posts |
|
tkizer
Almighty SQL Goddess
38200 Posts |
|
jezemine
Master Smack Fu Yak Hacker
2886 Posts |
Posted - 2012-02-10 : 18:49:11
|
No worries Tara! Thanks for helping out! :)It is not reproducible in SSMS. when you run the proc from there, it returns instantly every time. Nothing but seeks in the plan, looks great.Also, the problem DOES resolve itself on its own. that is, after a client has a timeout, the very next call from the same client is always fast. No human intervention needed.I did grab the plan once while the problem was happening using sp_whoisactive @get_plans=1 and it showed a plan that was good, same plan as when the proc is behaving normally, same plan that I see when I run the proc from SSMS.The DBA has run traces to catch it, but I don't think he was capturing the query plans. What he was seeing was the number of reads just skyrocketing when the symptom occurs. It made me think the plan was doing a scan over this largish table instead of a seek on the pk. The data in this table is highly transient - it's constantly be added to and deleted and it's only the rows with dtk close to max(dtk) that are of any value to clients. In that type of situation, I guess stats quickly become stale. elsasoft.org |
|
|
Kristen
Test
22859 Posts |
Posted - 2012-02-10 : 18:55:59
|
If it turns out to be UPDATE STATS have you got Auto Stats set to ASYNC rather than SYNC (SYNC is the default)Long shot and just thinking out loud: tripping over some network page/block size on the (new) size of retrieved data? Application's language (or network transport layer) doing something goofy with a binary(9) datatype ... I don't suppose you can easy set up a simulation such that you could try VARCHAR(9) instead? |
|
|
Kristen
Test
22859 Posts |
Posted - 2012-02-10 : 18:58:10
|
"The data in this table is highly transient"But that was the case before, so no conceptual change?Is BINARY-9 indexed? part of a WHERE clause? Seems improbably ... and if not I don't suppose there aren't even any STATS on that column? |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
|
jezemine
Master Smack Fu Yak Hacker
2886 Posts |
Posted - 2012-02-10 : 19:22:05
|
Kristen: binary(9) is not indexed, not part of where clause, is not referenced by the slow proc at all. and you are right, there is no conceptual change to the table - it was highly transient before also.Tara: I agree about the bad plan. The reads are very low usually, and they spike when the issue happens.thanks for your insight both of you! I'll go look and make sure stats are being updated regularly. elsasoft.org |
|
|
Kristen
Test
22859 Posts |
Posted - 2012-02-10 : 19:26:39
|
Move the BINARY(9) to a one-to-one table and a VIEW for any query that needs it?If there is something goofy with a byte-boundary or some funny-ness with BINARY(9) somewhere between database and application that would help for queries that don't need it. But I'm clutching at straws as you will have realised ! |
|
|
Kristen
Test
22859 Posts |
Posted - 2012-02-10 : 19:29:22
|
Actually, why would BINARY(9) only effect one query in 100 due to some issue in transport layer, I'm barking up the wrong tree.Byte-boundary in database is a really long shot I reckon.So that leaves something else having gone T.U. and the fact that you added a BINARY(9) column is coincidence? or just possibly it exacerbated a situation that already existed but went unnoticed before. |
|
|
jezemine
Master Smack Fu Yak Hacker
2886 Posts |
Posted - 2012-02-10 : 19:37:09
|
ok I found out that another change that happened is that regular update stats also was (inadvertently) turned off! It used to run every few minutes but not since the release that also added this column. So Kristen, I led you up that wrong tree! I figure stale stats is almost certainly the culprit. elsasoft.org |
|
|
Kristen
Test
22859 Posts |
Posted - 2012-02-10 : 19:59:00
|
quote: Originally posted by jezemine I figure stale stats is almost certainly the culprit.
Fingers crossed, and you can have the weekend off after all! |
|
|
jezemine
Master Smack Fu Yak Hacker
2886 Posts |
Posted - 2012-02-10 : 20:06:59
|
heh, thanks. What's weird about this thread is, I have nearly 3k posts here which seems like a lot, but that is NOTHING compared to you two!I'd feel a little bigger if some spammer with 2 posts would come along and offer me some viagra or something... :) elsasoft.org |
|
|
|