Aggregating horizontally - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

SQL Server Development (2000)

Aggregating horizontally

Author

Topic

heze
Posting Yak Master

192 Posts

Posted - 2006-03-20 : 23:46:48

Hi Im developing a completely unnormalized table which is in fact very useful for my bosses. THe table is intended to track a group of people defined by an ID number through time and the different characteristics are listed. ie.
ID characteristic year1 year2 year3 year4 yearn
22 HoursWorked 3000 3500 3888 ... ...

in reality I just produce manually the first 3 colimns, the rest is taken care of by means of temporry tables and innerjoins.
Ofcourse, although very comprehensive an useful for our purposes, it gets cumbersome and unnatural when aggregating horizontally. For example in the previous table: how much hours has 22 worked from year 1 to year 4. Here is where I need your comments. How to make this the easiest possible so as not to do Year1+Year2+...+Yearn
everytime I want to aggregate. I am analyzing using the Information schema or the sp_columns and build a user def funct with while cycles and maybe cursors to allow the end user to simply perform a MySUM(Range(n...m)) wherre n and m are the first and last column to aggregate. Is there any suggestion?

Thank you

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

7020 Posts

Posted - 2006-03-21 : 00:22:11

quote:
Originally posted by heze
...Im developing a completely unnormalized table...Is there any suggestion?...

I would suggest keeping your data in a normalized table and write you query against that.

CODO ERGO SUM

heze
Posting Yak Master

192 Posts

Posted - 2006-03-21 : 00:41:04

hi, the reason for this structure is that we have been able to perform some queries for finding attriion and retention rates in a much simpler way using this scenario than using a normalized one, additionally, the number of characteristics to monitor per year per person is now in the order of 40 and will continue increasing, the process for producing the rest of the colmuns is automatic so duplicated queriesare not a problem, furthermore since my job is to provide precise numbers I need a source iin which I can search in a quick way that the results of my queries to the normalized tables as well as the trends to see if my reports make sense, my bosses, who are not programmers want to be able to do the same. The department produces numbers so we can not afford building applications for everything.

madhivanan
Premature Yak Congratulator

22864 Posts

Posted - 2006-03-21 : 01:13:04

Read about Normalisation
http://www.datamodel.org/NormalizationRules.html

Madhivanan

Failing to plan is Planning to fail

Srinika
Master Smack Fu Yak Hacker

1378 Posts

Posted - 2006-03-21 : 06:35:52

If u insist on ur un-normalized design,
U might create Dynamic SQL;
eg.

declare @s varchar(8000), @s1 varchar(8000)
Set @s = ' Selelct ID, '
--------------
Using a loop(if the # of fields vary) or manually(if # of fields fixed), field names are taken to a string concatenated by necessary text and make the list as follows;
.....
.....
-- @s1 => 'isnull(year1,0) + isnull(year2,0) + isnull(year3,0) + .... isnull(yearn,0) '
-----------------

Set @s = @s + @s1 + ' From ....'
Exec (@s)

heze
Posting Yak Master

192 Posts

Posted - 2006-03-21 : 08:36:58

Thanks Srinka!, it worked just fine, I used:
declare cr_ColumnNames cursor
for
select COLUMN_NAME
from INFORMATION_SCHEMA.COLUMNS
where TABLE_NAME=@currentColumnName AND ORDINAL_POSITION between @startRange and @endRange
ORDER BY ORDINAL_POSITION

open cr_ColumnNames

FETCH NEXT FROM cr_ColumnNames into @currentColumn

select @Str1='cast(['+@currentColumn+'] as int)'
FETCH NEXT FROM cr_ColumnNames into @currentColumn

WHILE @@FETCH_STATUS = 0

--while(@range>=0)
begin
select @Str1=@Str1+ '+' +'cast(['+@currentColumn+'] as int)'
FETCH NEXT FROM cr_ColumnNames into @currentColumn
end

select @Str2='select '+ @Str1 +' from MT_TrackingElement_15'
print @Str2
exec(@Str2)

CLOSE cr_ColumnNames

DEALLOCATE cr_ColumnNames

heze
Posting Yak Master

192 Posts

Posted - 2006-03-21 : 08:42:25

In your experience, when do you know you are abusing of exec(). In my un normalized design it has been very useful but everything has a price and would like to receive some input. Performance nor security are issues for me, my job is to produce numbers.
thank you

madhivanan
Premature Yak Congratulator

22864 Posts

Posted - 2006-03-21 : 08:43:54

If you use Dynamic SQL, then the user has to have Select Permission on that specified table rather than just having Execute permission over stored Procedure

Madhivanan

Failing to plan is Planning to fail

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2006-03-21 : 09:37:51

heze -- you do understand that if you had a normalized design, you'd be able to "just get the numbers" even faster, with shorter and clearer code, without a cursor or dynamic sql, right?

Because your bosses eventually want to see your data spread out in columns has *nothing* to do with how you should actually be storing the data in your database.

Srinika
Master Smack Fu Yak Hacker

1378 Posts

Posted - 2006-03-21 : 10:10:59

heze,

[ Still I'm with the others that u'd be better off if u normalize ur table ]

U can avoid the cursor and produce the columns list as follows

DECLARE @sColsCols VARCHAR(8000)
SELECT @sCols=''
SELECT 	@sCols=@sCols + Column_name + ','
from	INFORMATION_SCHEMA.Columns
where	table_name = 'UrTbl' 
set @sCols = left(@sCols, Datalength(@sCols)-1)

heze
Posting Yak Master

192 Posts

Posted - 2006-03-21 : 10:36:44

Thanks for the comments. I have a normalized database. My un-normalized table adds nothing new datawise. The only problem is that, for tracking purposes I have to create many views and queries, maintianing and keeping track of them is a nightmare, specially because I perform queries to databases of different schemas in different servers. The databases are not mutually exclusive ie there can be one column in three different server.databases with apparently same data, however it becomes tricky because in one database it can be a snapshot while in the other it is a live table, furthermore, snapshots can occur at different points in time it is hard to keep track of the etire schema and not forget the details and allways have to search the field dictionaries.
Thus, I found useful to aggregate all data in one single table. I dont know if some of you have an alternative but when you want to track many characteristics (40 so far)of a particular individual thourgh time in a single table one has to make a decision, if the number of characteristics to track increases(records per individual in my table) at a much higher rate than the time frames(ie columns in my table) I found more natural to implement the mentioned solution.
Of course one can argue, why dont I create a set of normalized tables instead of one? the answer is that they already exist and we go back to the original problem? Of course, I write to the community to search for alternatives.
Srinka, thanks, will revisit my code.

Thank you

heze
Posting Yak Master

192 Posts

Posted - 2006-03-26 : 22:13:40

Hi again,

Concerned about my design, I did some research and among others heres what I found, I guess sometimes "best db practices" are not in sync with real life requirements-the following is a research paper from IBM, of relevance to this comment is the vertical schema which is extremely similar to my schema, of course the queries become complex but the attributes, as mentioned before increase much faster than the comumns which represent terms, ie we have 2 per year.

http://www.research.ibm.com/people/m/minwang/publications/edbt02.pdf

Comments?

thanks

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

7020 Posts

Posted - 2006-03-26 : 23:36:54

That paper doesn't seem at all relevant to what you have described. It is basically a discussion a specialized query processing layer that is optimized for an Entity/Attribute type schema, and not at all a real world implementation of a practical product.

You are basically talking about simple de-normalized tables in which you have summarized the data that would normally go into separate rows in a summary table into individual columns by time period.

I can tell you from experience that your approach is a bad idea, and that if you want to summarize by time period, you should just have a summary row for each time period.

Your approach suffers from several problems. You cannot easily rollup multiple time periods. The table usually takes more storage than necessary, because of sparseness. You have to continually modify the schema to add new time periods.

It is much easier to transform a table with a row per time period into a cross tab view than it is to go the other way.

However, you don’t seems willing to listen to the advice here. The problem you posted of having to sum multiple columns is a direct result of your design, and is one of its drawbacks. You seem intent on convincing us (or yourself) that it is a good design, and seem to be asking for some magic to make this problem go away.

CODO ERGO SUM

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources