Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
analyst79
Starting Member
1 Post |
Posted - 2013-02-07 : 15:21:50
|
A friend recommended I post to this community, saying it was both fast an excellent. So here I am.I am struggling with a business process of transforming data for analysis that I would like to handle in SQL. I have a CSV file that contains numerical and categorical information. The categorical columns would ideally be identified automatically by containing non-numeric data, with an optional list of columns that would be interpreted as categorical even if a numeric interpretation was possible (such as a list of accounting codes). Before loading the CSV, I will not know how many unique values exist in a column. I may have to list out those columns manually, however. Once the columns are identified as categorical, a new table is created for each column, containing that column's distinct values and an ID column. The categorical column in the original data set is then replaced by a series of columns containing Boolean values for each original value in the column. I will also end up scaling numerical columns so that the largest absolute value is 1. Any help as to code samples or a strategy for solving this would be greatly appreciated. Thanks!Example:data.csv:Category,BalanceA,1000B,2400C,800Raw Data table:Category,BalanceA,1000B,-2400C,800Category table:Category_id,Category_value1,A2,B3,CNormalization table:Column,FactorCategory,2400Processed Data table:Category__A,Category__B,Category__C,Sales__norm1,0,0,0.41666666666666670,1,0,-10,0,1,0.3333333333333333 |
|
LoztInSpace
Aged Yak Warrior
940 Posts |
Posted - 2013-02-07 : 20:22:45
|
You typically don't want to be doing that. You already have all the data you need to work it all out. The rest is a presentation issue, not a data issue.If you're using SSRS, there are matrix reports that pretty much cover what you are tring to do. A relational database is not the tool for this job. It's just not how they work. |
|
|
|
|
|
|
|