Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Development (2000)
 Active table

Author  Topic 

ckuo@kahluadesigns.com
Yak Posting Veteran

58 Posts

Posted - 2006-11-14 : 20:22:18
I have seen this in an database but not sure if it actually works. Say there is a "jobs" table.

---jobs----
job_id int Primary
status int
insert_dttm smalldatetime


This table could be pretty large. In the application there is frequent querying of this table based on values in fields. To speed up this query, a separate table is maintained called "active_jobs"

---active_jobs---
job_id int

This table is much smaller since it only contains job_id 's that are active as far as the business logic is concerned. So the select query looks something like:

SELECT jobs.* FROM jobs INNER JOIN active_jobs ON jobs.job_id = active_jobs.job_id WHERE jobs.status = 50

Would joining the large "jobs" table to a smaller "active_jobs" table actually speed returning data from "jobs"? Are there other ways to do this?

Thanks

snSQL
Master Smack Fu Yak Hacker

1837 Posts

Posted - 2006-11-14 : 22:29:23
If the table is indexed correctly and the queries are written correctly I cannot see that this would improve performance, and there is a good chance it could make it worse. But I am interested to see if anyone else will differ on that?
Go to Top of Page

harsh_athalye
Master Smack Fu Yak Hacker

5581 Posts

Posted - 2006-11-15 : 00:26:12
If you are ultimately joining with the main jobs table, what is the purpose of keeping separate table. Why not maintain just a flag in Jobs table (something like IsActive)?

Harsh Athalye
India.
"Nothing is Impossible"
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2006-11-15 : 01:06:38
"If the table is indexed correctly ..."

Well, you would want the IsActive column first, and that isn't going to be very selective.

You could create a VIEW and index that (index will probably be used even on queries on underlying table in Enterprise Version):

CREATE MyView
AS
SELECT job_id
FROM jobs
WHERE IsActive = 1

"I cannot see that this would improve performance"

I have had queries where the performance has needed improving, mostly because the query was sufficiently complex that Optimiser was not finding the best route. Generally my approach to those is to pre-query the most selective data into a Temp table, and then join that to the main query, and that helps performance (hugely in such scenarios), so I would reckon that joining an active_jobs table would speed things up. However, whether the "cost" of maintaining such a table, in real time, is worthwhile is a different question!

Kristen
Go to Top of Page

ckuo@kahluadesigns.com
Yak Posting Veteran

58 Posts

Posted - 2006-11-15 : 09:58:49
"Why not maintain just a flag in Jobs table (something like IsActive)?"

That is what I was thinking, but this database uses a smaller set of data to limit the indexing of the larger table. Thought it was an interesting way of doing it.

If what Kristen says is right, then I think this would be a good way to set up this new database of mine. Keeping the active_jobs table up to date is not a big deal since it only gets inserted and deleted when the job opens and closes, but for the many queries for the steps in between, it might be worthwile.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2006-11-15 : 15:50:30
"Keeping the active_jobs table up to date is not a big deal ..."

Worth using a Trigger to ensure that it can never get out of Sync.

It needs a Test to prove one way or the other, but I'll bet a Pint on the outcome!

Kristen
Go to Top of Page

PSamsig
Constraint Violating Yak Guru

384 Posts

Posted - 2006-11-16 : 13:22:14
quote:
Originally posted by ckuo@kahluadesigns.com

This table could be pretty large.
and that is 10K, 100K, 100M rows? And is it really only that few coloumns in it?
I'll be happy to make some test runs, I have a nice 2005 server that is awfully bored, but it wont be until next week, I'm away for most the weekend.


-- Alice came to a fork in the road. "Which road do I take?" she asked. "Where do you want to go?" responded the Cheshire cat. "I don't know," Alice answered. "Then," said the cat, "it doesn't matter."
Go to Top of Page

ckuo@kahluadesigns.com
Yak Posting Veteran

58 Posts

Posted - 2006-11-16 : 15:48:05
Hey some tests would be great. The table at this company had a few million rows but it was being hit all the time for just the active jobs. There is about 30-40 columns in it? About 5 or so indexes on it.

I'd like to do some tests too just to see but not sure how exact my results would be by creating some random records becuase the indexes are not realistic.
Go to Top of Page

PSamsig
Constraint Violating Yak Guru

384 Posts

Posted - 2006-11-16 : 15:56:58
Hmmm that depends if the queries would ever jump to one of these indexes, instead of using the PK in the active_jobs INNER JOIN senario or the IsActive in the index senarion.

-- Alice came to a fork in the road. "Which road do I take?" she asked. "Where do you want to go?" responded the Cheshire cat. "I don't know," Alice answered. "Then," said the cat, "it doesn't matter."
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2006-11-16 : 15:59:50
"I'd like to do some tests too just to see but not sure how exact my results would be by creating some random records because the indexes are not realistic."

Copy the Live table to a Sandpit database, and create the "shortcut" table, and try some queries.

My expectation is that this particular scenario, with a little extra programming to maintain a parallel "Active Jobs" table is going to run rings around the Query Optimiser.

Kristen
Go to Top of Page

ckuo@kahluadesigns.com
Yak Posting Veteran

58 Posts

Posted - 2006-11-16 : 19:56:16
Sorry, forgot to mention I dont have this database, it was at my last place. This is just something I am considering in a future database design. Thanks all.
Go to Top of Page

snSQL
Master Smack Fu Yak Hacker

1837 Posts

Posted - 2006-11-17 : 16:25:49
I have done some testing on a table with about 4 million rows and found some pretty interesting things.

Table structure is like this
CREATE TABLE [BigTable] (
[btID] [int] IDENTITY PRIMARY KEY,
[col1] [int],
[col2] [datetime],
[col3] [varchar] (52),
[col4] [char] (13),
[status] [smallint],
[statusbit] [bit],
[statusbyte] [tinyint],
)
Create index statements
create index BTstatus on BigTable(status)
create index BTstatus on BigTable(statusbit)
create index BTstatus on BigTable(statusbyte)
I only ever put on one of these at a time while testing

I tested the following (I used 524 random "active" rows):
1. Status flag (bit, tinyint, smallint makes no difference) in the table and no index on the status flag vs join to an active rows table
2. Bit status flag in the table with an index on the status flag vs join to an active rows table
3. Tinyint status flag in the table with an index on the status flag vs join to an active rows table
4. Smallint status flag in the table with an index on the status flag vs join to an active rows table

The average join performance was
Duration - 16 ms, 1918 reads.

Obviously using no index the performance is hopeless
Duration - 19 sec, 242666 reads.

But the indexed status flag performance was very interesting
Bit: Duration - 460 ms, 8950 reads.
Tinyint: Duration - 23 ms, 1826 reads.
Smallint: Duration - 23 ms, 1833 reads.

When I increased the number of active rows to 8367 the numbers were similar (in fact if anything the indexed status flag numbers improved)

The average join performance was
Duration - 203 ms, 29128 reads.

No index the performance
Duration - 20 sec, 242671 reads.

But the indexed status flag performance was very interesting
Bit: Duration - 595 ms, 34937 reads.
Tinyint: Duration - 203 ms, 27796 reads.
Smallint: Duration - 203 ms, 27825 reads.

So an index on the status flag is not a good idea if you use a bit field,
- about 20 times slower on 4 times as many reads when selecting 524 out of 4 million rows
- about 3 times slower on 1.2 times as many reads when selecting 8367 out of 4 million rows
BUT if you use a tinyint for the flag the performance of the indexed status flag is about the same on slightly fewer reads!

Conclusion: tinyint or smallint flags are indexed MUCH more effectively than bit flags for some reason (I don't know what that is - anyone?). Using a tinyint or smallint flag is much more simple than creating a separate table and joining it, so that's the way I'd go.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2006-11-18 : 00:30:44
Discussion on "Performance of low selectivity indexes" continued over at http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=75146
Go to Top of Page
   

- Advertisement -