Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 SQL Server Administration (2008)
 Large Number of SQL tables - design options

Author  Topic 

pnp
Starting Member

1 Post

Posted - 2012-03-15 : 02:40:27
Hi, I'm creating a web application where users can upload there excel doc. Any excel doc can have 1 row to 400,000 records in it. Each excel doc has its own schema - Different column names and different types. Max column per excel doc can be 500. Now once user have uploaded this doc, they can query on it based on any column they wish.
Eg. If a sample doc has A1,A2,A3,Name,Address, B1, C1 columns name in it, user can say give me all rows where A1>30 and Name starting with "A" etc. Basically user can query against there own data in whatever form they want.
I have to design the system considering that there will be 1000-20000 such excel doc uploaded.

There are 2 design that I am thinking
1. Use 1 table per upload. Each table having the same schema as the excel doc. User can query against it. with this design, we can possibly end up with upto 2ok tables.
Drawbacks of this design looks like - low usage of sql execution plan, maintenance ( may be). I am hoping this might give us better performance.

2. Have one tall table with column - Colid, ColString, ColInt, ColBool, etc. One of the ColString/int/bool will get populated based on the column type and rest of the column will be null. Colname will contain the users column name. another table will contain the column name - col id mapping along with the type of the column.
In this design(#2), if the user excel doc has 50 column and 100,000 records. Then this design will create 5M records for just 1 doc. Also there
will be a lot of column with null values in it ( we can use sparse column). This design can lead to increased query complexity and possible reduced performance.

What will you guys suggest. #1 vs #2 or should there any other #3 approach, that I should look into.

Thanks for your help.

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

7020 Posts

Posted - 2012-03-15 : 12:07:23
Of the two, I would go with option 1.

Option 2 looks like an entity-attribute-value design. If you pick that option, your users will hate you and you will end up hating yourself.




CODO ERGO SUM
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2012-03-16 : 04:56:03
Maybe this?
CREATE TABLE	dbo.FileNames
(
FileID INT IDENTITY(1, 1) PRIMARY KEY,
[FileName] NVARCHAR(260) NOT NULL,
Uploaded DATETIME NOT NULL
)

CREATE TABLE dbo.Cells
(
FileID INT NOT NULL,
CellRow INT NOT NULL,
Col0001 VARCHAR(MAX) SPARSE NULL,
Col0002 VARCHAR(MAX) SPARSE NULL,
...
Col1023 VARCHAR(MAX) SPARSE NULL,
Col1024 VARCHAR(MAX) SPARSE NULL,
PRIMARY KEY
(
FileID,
CellRow
)
)

Then it would be trivial for you to get out all ColumnHeaders (CellRow = 1) for each file.
By position, you can then translate column name to column position (ColXXXX) and query the table.
All transparent to the end user.



N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

X002548
Not Just a Number

15586 Posts

Posted - 2012-03-16 : 11:36:32
ahhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh

One Directory table
One table with the layout based on the filename (or something)
One table with 1 column

Brett

8-)

Hint: Want your questions answered fast? Follow the direction in this link
http://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspx


Want to help yourself?

http://msdn.microsoft.com/en-us/library/ms130214.aspx

http://weblogs.sqlteam.com/brettk/

http://brettkaiser.blogspot.com/


Go to Top of Page
   

- Advertisement -