| Author |
Topic |
|
ckuo@kahluadesigns.com
Yak Posting Veteran
58 Posts |
Posted - 2006-11-14 : 20:22:18
|
| I have seen this in an database but not sure if it actually works. Say there is a "jobs" table.---jobs----job_id int Primarystatus intinsert_dttm smalldatetimeThis table could be pretty large. In the application there is frequent querying of this table based on values in fields. To speed up this query, a separate table is maintained called "active_jobs"---active_jobs---job_id intThis table is much smaller since it only contains job_id 's that are active as far as the business logic is concerned. So the select query looks something like:SELECT jobs.* FROM jobs INNER JOIN active_jobs ON jobs.job_id = active_jobs.job_id WHERE jobs.status = 50Would joining the large "jobs" table to a smaller "active_jobs" table actually speed returning data from "jobs"? Are there other ways to do this?Thanks |
|
|
snSQL
Master Smack Fu Yak Hacker
1837 Posts |
Posted - 2006-11-14 : 22:29:23
|
| If the table is indexed correctly and the queries are written correctly I cannot see that this would improve performance, and there is a good chance it could make it worse. But I am interested to see if anyone else will differ on that? |
 |
|
|
harsh_athalye
Master Smack Fu Yak Hacker
5581 Posts |
Posted - 2006-11-15 : 00:26:12
|
| If you are ultimately joining with the main jobs table, what is the purpose of keeping separate table. Why not maintain just a flag in Jobs table (something like IsActive)?Harsh AthalyeIndia."Nothing is Impossible" |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2006-11-15 : 01:06:38
|
| "If the table is indexed correctly ..."Well, you would want the IsActive column first, and that isn't going to be very selective.You could create a VIEW and index that (index will probably be used even on queries on underlying table in Enterprise Version):CREATE MyViewASSELECT job_idFROM jobsWHERE IsActive = 1"I cannot see that this would improve performance"I have had queries where the performance has needed improving, mostly because the query was sufficiently complex that Optimiser was not finding the best route. Generally my approach to those is to pre-query the most selective data into a Temp table, and then join that to the main query, and that helps performance (hugely in such scenarios), so I would reckon that joining an active_jobs table would speed things up. However, whether the "cost" of maintaining such a table, in real time, is worthwhile is a different question!Kristen |
 |
|
|
ckuo@kahluadesigns.com
Yak Posting Veteran
58 Posts |
Posted - 2006-11-15 : 09:58:49
|
| "Why not maintain just a flag in Jobs table (something like IsActive)?"That is what I was thinking, but this database uses a smaller set of data to limit the indexing of the larger table. Thought it was an interesting way of doing it.If what Kristen says is right, then I think this would be a good way to set up this new database of mine. Keeping the active_jobs table up to date is not a big deal since it only gets inserted and deleted when the job opens and closes, but for the many queries for the steps in between, it might be worthwile. |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2006-11-15 : 15:50:30
|
| "Keeping the active_jobs table up to date is not a big deal ..."Worth using a Trigger to ensure that it can never get out of Sync.It needs a Test to prove one way or the other, but I'll bet a Pint on the outcome!Kristen |
 |
|
|
PSamsig
Constraint Violating Yak Guru
384 Posts |
Posted - 2006-11-16 : 13:22:14
|
quote: Originally posted by ckuo@kahluadesigns.com This table could be pretty large.
and that is 10K, 100K, 100M rows? And is it really only that few coloumns in it?I'll be happy to make some test runs, I have a nice 2005 server that is awfully bored, but it wont be until next week, I'm away for most the weekend.-- Alice came to a fork in the road. "Which road do I take?" she asked. "Where do you want to go?" responded the Cheshire cat. "I don't know," Alice answered. "Then," said the cat, "it doesn't matter." |
 |
|
|
ckuo@kahluadesigns.com
Yak Posting Veteran
58 Posts |
Posted - 2006-11-16 : 15:48:05
|
| Hey some tests would be great. The table at this company had a few million rows but it was being hit all the time for just the active jobs. There is about 30-40 columns in it? About 5 or so indexes on it.I'd like to do some tests too just to see but not sure how exact my results would be by creating some random records becuase the indexes are not realistic. |
 |
|
|
PSamsig
Constraint Violating Yak Guru
384 Posts |
Posted - 2006-11-16 : 15:56:58
|
| Hmmm that depends if the queries would ever jump to one of these indexes, instead of using the PK in the active_jobs INNER JOIN senario or the IsActive in the index senarion.-- Alice came to a fork in the road. "Which road do I take?" she asked. "Where do you want to go?" responded the Cheshire cat. "I don't know," Alice answered. "Then," said the cat, "it doesn't matter." |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2006-11-16 : 15:59:50
|
| "I'd like to do some tests too just to see but not sure how exact my results would be by creating some random records because the indexes are not realistic."Copy the Live table to a Sandpit database, and create the "shortcut" table, and try some queries.My expectation is that this particular scenario, with a little extra programming to maintain a parallel "Active Jobs" table is going to run rings around the Query Optimiser.Kristen |
 |
|
|
ckuo@kahluadesigns.com
Yak Posting Veteran
58 Posts |
Posted - 2006-11-16 : 19:56:16
|
| Sorry, forgot to mention I dont have this database, it was at my last place. This is just something I am considering in a future database design. Thanks all. |
 |
|
|
snSQL
Master Smack Fu Yak Hacker
1837 Posts |
Posted - 2006-11-17 : 16:25:49
|
| I have done some testing on a table with about 4 million rows and found some pretty interesting things.Table structure is like thisCREATE TABLE [BigTable] ( [btID] [int] IDENTITY PRIMARY KEY, [col1] [int], [col2] [datetime], [col3] [varchar] (52), [col4] [char] (13), [status] [smallint], [statusbit] [bit], [statusbyte] [tinyint],)Create index statementscreate index BTstatus on BigTable(status)create index BTstatus on BigTable(statusbit)create index BTstatus on BigTable(statusbyte)I only ever put on one of these at a time while testingI tested the following (I used 524 random "active" rows):1. Status flag (bit, tinyint, smallint makes no difference) in the table and no index on the status flag vs join to an active rows table 2. Bit status flag in the table with an index on the status flag vs join to an active rows table 3. Tinyint status flag in the table with an index on the status flag vs join to an active rows table 4. Smallint status flag in the table with an index on the status flag vs join to an active rows table The average join performance wasDuration - 16 ms, 1918 reads.Obviously using no index the performance is hopelessDuration - 19 sec, 242666 reads.But the indexed status flag performance was very interestingBit: Duration - 460 ms, 8950 reads.Tinyint: Duration - 23 ms, 1826 reads.Smallint: Duration - 23 ms, 1833 reads.When I increased the number of active rows to 8367 the numbers were similar (in fact if anything the indexed status flag numbers improved)The average join performance wasDuration - 203 ms, 29128 reads.No index the performanceDuration - 20 sec, 242671 reads.But the indexed status flag performance was very interestingBit: Duration - 595 ms, 34937 reads.Tinyint: Duration - 203 ms, 27796 reads.Smallint: Duration - 203 ms, 27825 reads.So an index on the status flag is not a good idea if you use a bit field, - about 20 times slower on 4 times as many reads when selecting 524 out of 4 million rows - about 3 times slower on 1.2 times as many reads when selecting 8367 out of 4 million rowsBUT if you use a tinyint for the flag the performance of the indexed status flag is about the same on slightly fewer reads!Conclusion: tinyint or smallint flags are indexed MUCH more effectively than bit flags for some reason (I don't know what that is - anyone?). Using a tinyint or smallint flag is much more simple than creating a separate table and joining it, so that's the way I'd go. |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2006-11-18 : 00:30:44
|
| Discussion on "Performance of low selectivity indexes" continued over at http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=75146 |
 |
|
|
|