DB Design and Data Retrieval from a heavy table

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2008 Forums

Transact-SQL (2008)

DB Design and Data Retrieval from a heavy table

Author

Topic

imadiamond2016
Starting Member

29 Posts

Posted - 2012-06-01 : 03:45:29

I have a requirement to have 612 columns in my database table. The # of columns as per data type are:

BigInt – 150 (PositionCol1, PositionCol2…………PositionCol150)

Int - 5

SmallInt – 5

Date – 150 (SourceDateCol1, SourceDate2,………….SourceDate150)

DateTime – 2

Varchar(2000) – 150 (FormulaCol1, FormulaCol2………………FormulaCol150)

Bit – 150 (IsActive1, IsActive2,……………….IsActive150)

When user does the import for first time the data gets stored in PositionCol1, SourceDateCol1, FormulaCol1, IsActiveCol1, etc. (other datetime, Int, Smallint columns).

When user does the import for second time the data gets stored in PositionCol2, SourceDateCol2, FormulaCol2, IsActiveCol2, etc. (other datetime, Int, Smallint columns)….. so and so on.

There is a ProjectID column in the table for which data is being imported.

Before starting the import process, user maps the excel column names with the database column names (PositionCol1, SourceDateCol1, FormulaCol1, IsActiveCol1) and this mapping get stored in a separate table; so that when retrieved data can be shown under these mapping column names instead of DB column names. E.g.

PositionCol1 may be mapped to SAPDATA

SourceDateCol1 may be mapped to SAPDATE

FormulaCol1 may be mapped to SAPFORMULA

IsActiveCol1 may be mapped to SAPISACTIVE

40,000 rows will be added in this table every day, my questions is that will the SQL be able to handle the load of that much of data in the long run?

Most of the times, a row will have data in about 200-300 columns; in the worst case it’ll have data in all of the 612 columns. Keeping in view this point, shall I make some changes in the design to avoid any future performance issues? If so, please suggest what could be done?

If I stick to my current design, what points I should take care of, apart from Indexing, to have optimal performance while retrieving the data from this huge table?

If I need to retrieve data of a particular entity e.g. SAPDATA, I’ll have to go to my mapping table, get the database column name against SAPDATA i.e. PositionCol1 in this case; and retrieve it. But, in that way, I’ll have to write dynamic queries. Is there any other better way?

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2012-06-01 : 12:32:18

quote:
Originally posted by imadiamond2016

I have a requirement to have 612 columns in my database table.

Holy cow! Why is this even a requirement? Why not normalize your table instead?

Have you considered sparse columns?

Tara Kizer
Microsoft MVP for Windows Server System - SQL Server
http://weblogs.sqlteam.com/tarad/

Subscribe to my blog

imadiamond2016
Starting Member

29 Posts

Posted - 2012-06-01 : 14:22:01

Apologies for creating confusion. The aforesaid is a design that I’m considering, my current design and business logic is as following:

There can be N number of clients. CLT_Clients table stores client information.

There can be N number of projects associated to a client. PRJ_Projects table stores project information and a link to the client.

There can be N number of listings associated to a project. PRJ_Listings table stores listing information and a link to the project.

There can be N number of source entities associated to a listing. ST_Entities table stores source entity information and a link to the listing. (Note: if I go with 612 column design, then it has to be restricted to 150 columns)

This source entity is the actual import that contains the InvestorID, position values, source date, active and formula status.

The name of the import e.g. L1Entity1 is stored in ST_Entities table alongwith ID field i.e. EntityID
InvestorID, Position, Source Date, Active and Formula values get stored in ST_Positions table

Database Diagram

Data need to be view as following:

With this design I’m able to handle N number of imports because the Position, Source Date, IsActive, Formula columns get Pivoted.

The problem that I’m facing with this design is that the system performs very slow when it has to select data for more than 10-12 source entities, and the requirement is to show about 150 source entities. Because data is not stored row wise and I need to show it column wise, hence dynamic queries are written to pivot these columns which takes long.

Please comment/suggest on my current database design if it’s correct or need to be changed with the new design that I explained in the first post?
If I need to stick to my current, what can be done to improve the performance?

robvolk
Most Valuable Yak

15732 Posts

Posted - 2012-06-01 : 16:08:08

Your current design sounds reasonable. Any performance issues may simply require proper indexing. As long as you're only trying to pivot 1 client ID at a time, creating an index that includes the columns to be pivoted should be enough. Can you post the SQL code you're currently using to generate the dynamic pivots?

Do you only need to pivot this data for presentation purposes? If so, do you have to provide pivoted data in Excel? If that's the case, why not use Excel's built-in pivot tables? You can extract the raw data from SQL Server into a worksheet, and either automate the pivoting with an Excel macro, or if the data area is a fairly regular size, simply define the pivot source data to cover the largest client that could be retrieved.

You should probably also look at Power Pivot, it's designed for working with large data sets in Excel.

imadiamond2016
Starting Member

29 Posts

Posted - 2012-06-01 : 16:41:55

As long as you're only trying to pivot 1 client ID at a time, creating an index that includes the columns to be pivoted should be enough.

Yes, only one project's data will be pivoted at a time. Regarding indexes in Position table, I also have taken ProjectID field with clustered index as the data is selected from Position table either on the basis of ProjectID OR EntityID.
Whenever EntityID is used to select data from Position table, it's always used in JOIN. And Whenever ProjectID is used to select data from this table, it's always used in WHERE.
Point to notice here is that I have a Clustered index on ProjectID but I have not taken any index on Pivoted column OR EntityID. Is there any room for improvement here?

Can you post the SQL code you're currently using to generate the dynamic pivots?

Example 1:



'Select * From
	(
		Select DD.InvestorID,Cast(1 As Bit) As IsDSInvestor,DD.Position,
		Case DD.ProjectID
		When ' + CAST(@ProjectID AS VARCHAR) +' Then DE.SourceName 
		Else ''' + @PPDeliveryDate + '''+DE.SourceName
		End As SourceName
		From DE_PositionData DD
		Inner Join DE_DataEntities DE ON DE.EntityID=DD.EntityID
		Where DD.ProjectID IN (' + CAST(@ProjectID AS VARCHAR) +',' + CAST(@PreviousProjectID AS VARCHAR) +') AND InvestorID IS NOT NULL
		
	) IDD
	Pivot
	(
		max(Position) for SourceName in ('+ @DataColumns+')
	) as p1'

Example2:


'Select * From
		(
			Select DD.InvestorID As DSSOFID,Cast(1 As Bit) As IsActiveInvestor,
			Case ST.SourceTypeCode
				When ''RSH'' Then Cast(IsNull(DD.IsActive,0) As Int)
				Else Cast(IsNull(DD.IsActive,1) As Int) 
			End As IsActive,
			''~''+DE.SourceName As ActiveSourceName
			From DE_DataEntities DE
			Left Join DE_PositionData DD ON DE.EntityID=DD.EntityID
			Left Join 
			(
				Select * From #DataSources
				Where ProjectID=' + CAST(@ProjectID AS VARCHAR) +'
			) ST ON ST.ESourceID=DE.ESourceID
			Where DE.ProjectID=' + CAST(@ProjectID AS VARCHAR) +' AND ST.SourceTypeCode NOT IN (''PBC'',''EBL'',''REG'')
			AND InvestorID IS NOT NULL
		) IDD
		Pivot
		(
			Max(IsActive) for ActiveSourceName in ('+ @DataColumns+')
		) As p1'

Do you only need to pivot this data for presentation purposes? If so, do you have to provide pivoted data in Excel? If that's the case, why not use Excel's built-in pivot tables?

No, I need to show it in Infragistics wingrid and user will be able to make changes in it. But you have made a great point here which I never thought of. i.e. using built-in functionality for pivot columns. And Infragistics offers that functionality. Thanks for giving such a helping comment.

Please have a review of the query, and provide your comments too.

robvolk
Most Valuable Yak

15732 Posts

Posted - 2012-06-01 : 16:45:51

I'd recommend following that route before revisiting a SQL method for pivoting. This kind of data manipulation is best left to a client application, so unless you absolutely can't make it work with Infragistics, don't try to pivot it in SQL.

imadiamond2016
Starting Member

29 Posts

Posted - 2012-06-01 : 19:32:32

I just checked, unfortunately Infragistics does not have Pivot feature. Is there any other alternative which helps in pivoting in client application?

robvolk
Most Valuable Yak

15732 Posts

Posted - 2012-06-01 : 19:56:14

I know there are some reporting components available that can do pivoting, if not Infragistics, try Dundas or other .NET component vendors.

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources