Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
jdblack
Starting Member
11 Posts |
Posted - 2009-11-08 : 09:46:14
|
Our engineering group recently converted a large database to a star schema. We get massive amounts of data a day, and the data importers (distributed) can't keep up with the flow due to excessive joins with the new schema. As a stop gap they are putting up more systems to participate in the import process..What are their options?1. Get rid of of some dimension tables.2. Hardware upgrades? What's the bottleneck? Would a solid state drive help if just the dimension tables were stored? I'd hope SQL is smart enough to store the dimension tables in memory, but I don't know enough to be sure.I'm one of the consumers of the data, and while querying is much easier now they are not able to bring in all of the data in time.Thanks! |
|
russell
Pyro-ma-ni-yak
5072 Posts |
Posted - 2009-11-08 : 10:08:32
|
Not really enough information to answer the question. Would need to know what the schema looked like before the change for one thing. Did they not test the imports before making the change?The joins aren't necessarilly the problem. Proper indexing might be though.Is this a data warehouse? Faster drives may help but can't say without knowing where the bottleneck is.How much data is "massive amounts?" How is it imported? How often? How long did it used to take and how long does it take now? What else is happening on the server? What does I/O, RAM and CPU utilization look like?Back to indexing for a moment, it is common in a data warehouse to create many indexes. While this dramatically improves read performance, it degrades write performance. That's where I'd look 1stBy the way, this may not be an "Engineering Mistake" but not testing certainly was. Maybe just a couple of kinks to work out |
|
|
dportas
Yak Posting Veteran
53 Posts |
Posted - 2009-11-08 : 10:25:23
|
Sounds like your engineering group has been drinking too much of the Kimball Kool-Aid!The basis of an efficient data warehouse design should be a Boyce Codd / 5th Normal Form schema with appropriate indexing and partitioning strategies. Adequate performance testing also helps of course and that may also be something your developers could have done better. Buying more hardware to make up doesn't sound like a sensible trade-off to me. |
|
|
jdblack
Starting Member
11 Posts |
Posted - 2009-11-08 : 10:34:11
|
quote: Originally posted by russell Would need to know what the schema looked like before the change for one thing. Did they not test the imports before making the change?
Flat fact table, no dimensions, no joins, less indexing. Not sure what was not indexed previously. They only index int values now.
quote: The joins aren't necessarilly the problem. Proper indexing might be though.
Ok, I'll keep that in mind.quote: Is this a data warehouse? Faster drives may help but can't say without knowing where the bottleneck is.
Yes.quote: How much data is "massive amounts?"
1 TB+ imported a day.quote: How is it imported?
SQL to SQL over T3+ WAN links.quote: How often?
24x7x365quote: How long did it used to take and how long does it take now?
Don't know exactly. It used to be able to keep up, and now it is able to handle maybe 35% of the total. So at least 3 times faster previously.quote: What else is happening on the server?
Nothing.quote: What does I/O, RAM and CPU utilization look like?
They haven't released that info, and I don't have perms to the importers.quote: Back to indexing for a moment, it is common in a data warehouse to create many indexes. While this dramatically improves read performance, it degrades write performance. That's where I'd look 1st
I understand getting info about system perf counters is the most accurate way to tell, but I'm curious, partly for my own knowledge - Would a solid state drive (negligible seek, very low read/write, sequential does not matter) help in the below scenario? If "sometimes", in what cases?How about this scenario - 1. Data is pulled in raw from the remote source over the WAN link.2. When a chunk of data is complete, it is copied to the solid state drive.3. It is converted to star schema.4. Indexed.5. Dimension tables are stored on the solid state drive.6. The fact table is copied to conventional SAN array.Again for my knowledge, what resource do these steps hit the hardest?1. Indexing.2. Joining (converting to star schema).3. Looking up dimension values.quote: By the way, this may not be an "Engineering Mistake" but not testing certainly was. Maybe just a couple of kinks to work out
It's being downplayed to the max, believe you me. |
|
|
jdblack
Starting Member
11 Posts |
Posted - 2009-11-08 : 10:38:24
|
Clarification to "4.Indexed".If indexing is performed on the table before it is inserted into another table, is there a way to merge the index with that table? |
|
|
|
|
|
|
|