Author |
Topic |
kfj_de
Starting Member
12 Posts |
Posted - 2008-10-08 : 10:16:07
|
Hello,we are working on a database, which will hold measurement data.Each measurement comes in a .txt-File of 168 rows and 49 colums. Each of the 8232 values can be between -999.9999 and +999.9999.Together with the values there should be saved the date of the measurement and the serial number of the unit, that is measured.The values will be saved and will be used to draw graphs with .NET, where parts of the actual measurement will be compared to 4 other measurements of the past on the same unit.At the beginning there will be around 100-200 units which will be measured on reagular basis (about 4-10 times/year). This makes 200*10*8232=16.464.000 values. The whole thing might grow to around 1000 units later, but additional problems with these additional numbers might be solved later...;-)I am now thinking about the follwing:- A table that holds 8232 values in 8232 cells for each measurement will lead to a very big table. Will it be possible for a web application to compare the actual values with the historical values, when there are millions of values in the table? - If we would save the data row by row as a comma-separated string we would save a lot of space. But strings are bad for performance, aren't they? So what is worse? 49 decimals compared to 1 string?- How about indexing? Since we have to compare different parts of the values indexing doesn't make sense except of the unit serial number. Is there another way that might increase the performance of the queries?Any tips, ideas or best practices on how to make life a little bit more easy in handling such huge measurement data like that?Thanks for your help! |
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2008-10-08 : 10:25:16
|
Normalized tables. Something like this?CREATE TABLE #Stage ( Col01 SMALLMONEY NOT NULL, Col02 SMALLMONEY NOT NULL, ... Col48 SMALLMONEY NOT NULL, Col49 SMALLMONEY NOT NULL )<insert staging table here>INSERT TargetTable ( MeasurementID INT NOT NULL, ColValue SMALLMONEY NOT NULL )SELECT @MeasurementID, u.theValueFROM #Stage AS sUNPIVOT ( theValue FOR theCol IN (s.[Col01], s.[Col02], ..., s.[Col48], s.Col49]) ) AS u E 12°55'05.63"N 56°04'39.26" |
|
|
spirit1
Cybernetic Yak Master
11752 Posts |
Posted - 2008-10-08 : 10:27:17
|
go with table with 49 columns.with indexing the performance shouldn't be a problem. the numbers you're talking about are not that big.indexing makes sense only if your column data is highly selective. so a lot of different values. if you putting a non clustered index will speed things up.you have to test this of course as for the clustered index... you can either put it on date inserted column or identity. note that if you put it on date the date should be unique._______________________________________________Causing trouble since 1980Blog: http://weblogs.sqlteam.com/mladenpSpeed up SSMS development: www.ssmstoolspack.com <- version 1.1 out! |
|
|
kfj_de
Starting Member
12 Posts |
Posted - 2008-10-08 : 10:47:30
|
quote: Originally posted by Peso Normalized tables. Something like this?CREATE TABLE #Stage ( Col01 SMALLMONEY NOT NULL, Col02 SMALLMONEY NOT NULL, ... Col48 SMALLMONEY NOT NULL, Col49 SMALLMONEY NOT NULL )<insert staging table here>INSERT TargetTable ( MeasurementID INT NOT NULL, ColValue SMALLMONEY NOT NULL )SELECT @MeasurementID, u.theValueFROM #Stage AS sUNPIVOT ( theValue FOR theCol IN (s.[Col01], s.[Col02], ..., s.[Col48], s.Col49]) ) AS u E 12°55'05.63"N 56°04'39.26"
Hmmm...I didn't think of that solution. Thank you for your idea.That might lead to better performance concerning the SELECT Statements.Will there be perhaps a performance disadvantage with this solution in importing the data from the .txt-file and writing it into the database? |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2008-10-08 : 10:52:27
|
No. If MeasurementID is increasing write a clustered index over measurementid.Having the TargetTable normalized will make it easier for you to grab the 8332 data for any measurement and show in a graph.Keeping the data in 49 column will make the output more cumbersome.If you also have to number the datapoints (measurement value) this approach is also simple. E 12°55'05.63"N 56°04'39.26" |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2008-10-08 : 10:55:27
|
[code]CREATE TABLE #Stage ( RowID INT IDENTITY(0, 1), Col01 SMALLMONEY NOT NULL, Col02 SMALLMONEY NOT NULL, ... Col48 SMALLMONEY NOT NULL, Col49 SMALLMONEY NOT NULL )<insert staging table here, keeping identity column intact>INSERT TargetTable ( MeasurementID INT NOT NULL, ColNum SMALLINT NOT NULL, ColValue SMALLMONEY NOT NULL )SELECT @MeasurementID, 49 * u.RowID + CAST(RIGHT(u.theCol, 2) AS TINYINT), -- 0-based measurement point value enumeration u.theValueFROM #Stage AS sUNPIVOT ( theValue FOR theCol IN (s.[Col01], s.[Col02], ..., s.[Col48], s.Col49]) ) AS u[/code]Targettable has this index[code]CREATE UNIQUE CLUSTERED INDEX IX_TargetTable ON TargetTable (MeasurementID, ColNum)[/code] E 12°55'05.63"N 56°04'39.26" |
|
|
spirit1
Cybernetic Yak Master
11752 Posts |
Posted - 2008-10-08 : 11:31:24
|
emmm... design with col1 to col49 is normalized provided no colN value describes other colM value._______________________________________________Causing trouble since 1980Blog: http://weblogs.sqlteam.com/mladenpSpeed up SSMS development: www.ssmstoolspack.com <- version 1.1 out! |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2008-10-08 : 11:36:34
|
And the table design something like thisCREATE TABLE MeasurementUnits ( MeasurementUnitID INT IDENTITY(1, 1) NOT NULL, SerialNumber VARCHAR(16) NOT NULL )CREATE UNIQUE CLUSTERED INDEX PK_MeasurementUnitID ON MeasurementUnits (MeasurementUnitID)CREATE TABLE Measurements ( MeasurementID INT IDENTITY(1, 1) NOT NULL, MeasurementUnitID INT NOT NULL, DataTime SMALLDATETIME NOT NULL )CREATE UNIQUE INDEX IX_MeasurementID ON Measurements (MeasurementID) INCLUDE (MeasurementUnitID, DataTime)CREATE TABLE MeasurementDatas ( MeasurementID INT NOT NULL, ColNum SMALLINT NOT NULL, ColValue SMALLMONEY NOT NULL )CREATE UNIQUE CLUSTERED INDEX PK_MeasurementID_ColNum ON MeasurementDatas (MeasurementID, ColNum)ALTER TABLE MeasurementsADD CONSTRAINT FK_Measurements_MeasurementUnitsFOREIGN KEY ( MeasurementUnitID )REFERENCES MeasurementUnits ( MeasurementUnitID )ALTER TABLE MeasurementsCHECK CONSTRAINT FK_Measurements_MeasurementUnitsALTER TABLE MeasurementDatasADD CONSTRAINT FK_MeasurementDatas_MeasurementsFOREIGN KEY ( MeasurementID )REFERENCES Measurements ( MeasurementID )ALTER TABLE MeasurementDatasCHECK CONSTRAINT FK_MeasurementDatas_Measurements E 12°55'05.63"N 56°04'39.26" |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2008-10-08 : 11:38:24
|
quote: Originally posted by spirit1 emmm... design with col1 to col49 is normalized provided no colN value describes other colM value.
Perhaps.We will not until OP says what the 49 columns are for. If all 8232 datapoints are random in the file or if there are some "linebreak" after 49 columns.Or if the 49 columns actually means something. E 12°55'05.63"N 56°04'39.26" |
|
|
spirit1
Cybernetic Yak Master
11752 Posts |
Posted - 2008-10-08 : 11:44:36
|
well my thought are that if you get 49 columns of sensor data in one line then it has to be different meaning values._______________________________________________Causing trouble since 1980Blog: http://weblogs.sqlteam.com/mladenpSpeed up SSMS development: www.ssmstoolspack.com <- version 1.1 out! |
|
|
kfj_de
Starting Member
12 Posts |
Posted - 2008-10-08 : 12:29:38
|
There are 8232 values, that are organized in an array of 168 rows and 49 columns. A set of measurement values, that we have to use for the charts and what will result in the queries of that project, can use values from one or many rows and can also only use parts of the rows. So we will perhaps check programmatically, whether the values in Row 76 Column 45, 46 and 47 are within a given range or we are calculating the average values out of Row 101 and 102 columns 3 to 42. |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2008-10-08 : 12:40:43
|
If so, replace colNol column with two columns in MeasurementDatascolNum TINYINT NOT NULLrowNum TINYINT NOT NULL E 12°55'05.63"N 56°04'39.26" |
|
|
kfj_de
Starting Member
12 Posts |
Posted - 2008-10-08 : 12:47:47
|
Thank you very much. Your ideas really make sense and I am sure they will help us a lot. |
|
|
spirit1
Cybernetic Yak Master
11752 Posts |
Posted - 2008-10-08 : 12:51:49
|
wait,wait,wait.if you measuring device returns to you a file with for example 100 rows and each row has 49 values split by a ;.do those 49 values all come from the same sensor or do they come from different sensors?for example do they all come from measuring temperature on the same spot over time?or doeas each value of those 49 represent a temperature measurement from a different heat source?those 2 scenarions require very different table structures._______________________________________________Causing trouble since 1980Blog: http://weblogs.sqlteam.com/mladenpSpeed up SSMS development: www.ssmstoolspack.com <- version 1.1 out! |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2008-10-08 : 13:19:55
|
How come, if you store all values with the physical place in file, row and col? E 12°55'05.63"N 56°04'39.26" |
|
|
spirit1
Cybernetic Yak Master
11752 Posts |
Posted - 2008-10-08 : 13:40:40
|
if that's a reply to me, i have no clue what you mean, peter._______________________________________________Causing trouble since 1980Blog: http://weblogs.sqlteam.com/mladenpSpeed up SSMS development: www.ssmstoolspack.com <- version 1.1 out! |
|
|
kfj_de
Starting Member
12 Posts |
Posted - 2008-10-08 : 13:46:22
|
Each value represents a measurement. All the measurements are put together by an application we do not have influence on. This application produces a .txt-file which is tab delimited. The file is an array of values. The Rows represent some kind of different places and Columns represent some kind of different attributes. What we are doing with the values is to use predefined formulas against them and based on that result give a feedback to the uploader whether everything is o.k. with the unit or whether he has to do something. |
|
|
spirit1
Cybernetic Yak Master
11752 Posts |
Posted - 2008-10-08 : 13:52:32
|
well if all of the values do NOT represent the same measuerd value then you have to use col1 to colN design._______________________________________________Causing trouble since 1980Blog: http://weblogs.sqlteam.com/mladenpSpeed up SSMS development: www.ssmstoolspack.com <- version 1.1 out! |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2008-10-08 : 13:58:08
|
How will you then solve the scenario when you want to take the average of row 56-61 and columns 23-29?Or the average of row 5 and columns 12-33?Or the average of row 101-113 and column 17?This will solve any of these sitations.CREATE TABLE MeasurementDatas ( MeasurementID INT NOT NULL, RowNum TINYINT NOT NULL, ColNum TINYINT NOT NULL, CellValue SMALLMONEY NOT NULL )SELECT AVG(CellValue)FROM MeasurementDatasWHERE MeasurementID = @MeasurementID AND RowNum BETWEEN @LowRow AND @HighRow AND ColNum BETWEEN @LowCol AND @HighCol [/code] E 12°55'05.63"N 56°04'39.26" |
|
|
spirit1
Cybernetic Yak Master
11752 Posts |
Posted - 2008-10-08 : 14:03:25
|
yeah that'd work too very nice._______________________________________________Causing trouble since 1980Blog: http://weblogs.sqlteam.com/mladenpSpeed up SSMS development: www.ssmstoolspack.com <- version 1.1 out! |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2008-10-08 : 14:18:46
|
Sorry I didn't make that clearer earlier. E 12°55'05.63"N 56°04'39.26" |
|
|
Next Page
|