Saving data with 8232 values per measurement

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

General SQL Server Forums

Database Design and Application Architecture

Saving data with 8232 values per measurement

Author

Topic

kfj_de
Starting Member

12 Posts

Posted - 2008-10-08 : 10:16:07

Hello,

we are working on a database, which will hold measurement data.

Each measurement comes in a .txt-File of 168 rows and 49 colums. Each of the 8232 values can be between -999.9999 and +999.9999.

Together with the values there should be saved the date of the measurement and the serial number of the unit, that is measured.

The values will be saved and will be used to draw graphs with .NET, where parts of the actual measurement will be compared to 4 other measurements of the past on the same unit.

At the beginning there will be around 100-200 units which will be measured on reagular basis (about 4-10 times/year). This makes 200*10*8232=16.464.000 values. The whole thing might grow to around 1000 units later, but additional problems with these additional numbers might be solved later...;-)

I am now thinking about the follwing:

- A table that holds 8232 values in 8232 cells for each measurement will lead to a very big table. Will it be possible for a web application to compare the actual values with the historical values, when there are millions of values in the table?
- If we would save the data row by row as a comma-separated string we would save a lot of space. But strings are bad for performance, aren't they? So what is worse? 49 decimals compared to 1 string?
- How about indexing? Since we have to compare different parts of the values indexing doesn't make sense except of the unit serial number. Is there another way that might increase the performance of the queries?

Any tips, ideas or best practices on how to make life a little bit more easy in handling such huge measurement data like that?

Thanks for your help!

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-10-08 : 10:25:16

Normalized tables. Something like this?

CREATE TABLE	#Stage
		(
			Col01 SMALLMONEY NOT NULL,
			Col02 SMALLMONEY NOT NULL,
			...
			Col48 SMALLMONEY NOT NULL,
			Col49 SMALLMONEY NOT NULL
	)

<insert staging table here>

INSERT	TargetTable
	(
		MeasurementID INT NOT NULL,
		ColValue SMALLMONEY NOT NULL
	)
SELECT	@MeasurementID,
	u.theValue
FROM	#Stage AS s
UNPIVOT	(
		theValue
		FOR theCol IN (s.[Col01], s.[Col02], ..., s.[Col48], s.Col49])
	) AS u

E 12°55'05.63"
N 56°04'39.26"

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2008-10-08 : 10:27:17

go with table with 49 columns.

with indexing the performance shouldn't be a problem. the numbers you're talking about are not that big.
indexing makes sense only if your column data is highly selective. so a lot of different values. if you putting a non clustered index will speed things up.
you have to test this of course

as for the clustered index... you can either put it on date inserted column or identity. note that if you put it on date the date should be unique.

_______________________________________________
Causing trouble since 1980
Blog: http://weblogs.sqlteam.com/mladenp
Speed up SSMS development: www.ssmstoolspack.com <- version 1.1 out!

kfj_de
Starting Member

12 Posts

Posted - 2008-10-08 : 10:47:30

quote:

Originally posted by Peso

Normalized tables. Something like this?

CREATE TABLE	#Stage
		(
			Col01 SMALLMONEY NOT NULL,
			Col02 SMALLMONEY NOT NULL,
			...
			Col48 SMALLMONEY NOT NULL,
			Col49 SMALLMONEY NOT NULL
	)

<insert staging table here>

INSERT	TargetTable
	(
		MeasurementID INT NOT NULL,
		ColValue SMALLMONEY NOT NULL
	)
SELECT	@MeasurementID,
	u.theValue
FROM	#Stage AS s
UNPIVOT	(
		theValue
		FOR theCol IN (s.[Col01], s.[Col02], ..., s.[Col48], s.Col49])
	) AS u

E 12°55'05.63"
N 56°04'39.26"

Hmmm...I didn't think of that solution. Thank you for your idea.That might lead to better performance concerning the SELECT Statements.

Will there be perhaps a performance disadvantage with this solution in importing the data from the .txt-file and writing it into the database?

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-10-08 : 10:52:27

No. If MeasurementID is increasing write a clustered index over measurementid.
Having the TargetTable normalized will make it easier for you to grab the 8332 data for any measurement and show in a graph.
Keeping the data in 49 column will make the output more cumbersome.
If you also have to number the datapoints (measurement value) this approach is also simple.

E 12°55'05.63"
N 56°04'39.26"

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-10-08 : 10:55:27

[code]CREATE TABLE #Stage
(
RowID INT IDENTITY(0, 1),
Col01 SMALLMONEY NOT NULL,
Col02 SMALLMONEY NOT NULL,
...
Col48 SMALLMONEY NOT NULL,
Col49 SMALLMONEY NOT NULL
)

<insert staging table here, keeping identity column intact>

INSERT TargetTable
(
MeasurementID INT NOT NULL,
ColNum SMALLINT NOT NULL,
ColValue SMALLMONEY NOT NULL
)
SELECT @MeasurementID,
49 * u.RowID + CAST(RIGHT(u.theCol, 2) AS TINYINT), -- 0-based measurement point value enumeration
u.theValue
FROM #Stage AS s
UNPIVOT (
theValue
FOR theCol IN (s.[Col01], s.[Col02], ..., s.[Col48], s.Col49])
) AS u[/code]Targettable has this index[code]CREATE UNIQUE CLUSTERED INDEX IX_TargetTable ON TargetTable (MeasurementID, ColNum)[/code]

E 12°55'05.63"
N 56°04'39.26"

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2008-10-08 : 11:31:24

emmm... design with col1 to col49 is normalized provided no colN value describes other colM value.

_______________________________________________
Causing trouble since 1980
Blog: http://weblogs.sqlteam.com/mladenp
Speed up SSMS development: www.ssmstoolspack.com <- version 1.1 out!

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-10-08 : 11:36:34

And the table design something like this

CREATE TABLE	MeasurementUnits
		(
			MeasurementUnitID INT IDENTITY(1, 1) NOT NULL,
			SerialNumber VARCHAR(16) NOT NULL
		)

CREATE UNIQUE CLUSTERED INDEX PK_MeasurementUnitID ON MeasurementUnits (MeasurementUnitID)

CREATE TABLE	Measurements
		(
			MeasurementID INT IDENTITY(1, 1) NOT NULL,
			MeasurementUnitID INT NOT NULL,
			DataTime SMALLDATETIME NOT NULL
		)


CREATE UNIQUE INDEX IX_MeasurementID ON Measurements (MeasurementID) INCLUDE (MeasurementUnitID, DataTime)

CREATE TABLE	MeasurementDatas
		(
			MeasurementID INT NOT NULL,
			ColNum SMALLINT NOT NULL,
			ColValue SMALLMONEY NOT NULL
		)

CREATE UNIQUE CLUSTERED INDEX PK_MeasurementID_ColNum ON MeasurementDatas (MeasurementID, ColNum)

ALTER TABLE	Measurements
ADD CONSTRAINT	FK_Measurements_MeasurementUnits
FOREIGN KEY	(
			MeasurementUnitID
		)
REFERENCES	MeasurementUnits
		(
			MeasurementUnitID
		)
ALTER TABLE		Measurements
CHECK CONSTRAINT	FK_Measurements_MeasurementUnits

ALTER TABLE	MeasurementDatas
ADD CONSTRAINT	FK_MeasurementDatas_Measurements
FOREIGN KEY	(
			MeasurementID
		)
REFERENCES	Measurements
		(
			MeasurementID
		)

ALTER TABLE		MeasurementDatas
CHECK CONSTRAINT	FK_MeasurementDatas_Measurements

E 12°55'05.63"
N 56°04'39.26"

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-10-08 : 11:38:24

quote:
Originally posted by spirit1

emmm... design with col1 to col49 is normalized provided no colN value describes other colM value.

Perhaps.

We will not until OP says what the 49 columns are for. If all 8232 datapoints are random in the file or if there are some "linebreak" after 49 columns.
Or if the 49 columns actually means something.

E 12°55'05.63"
N 56°04'39.26"

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2008-10-08 : 11:44:36

well my thought are that if you get 49 columns of sensor data in one line then it has to be different meaning values.

_______________________________________________
Causing trouble since 1980
Blog: http://weblogs.sqlteam.com/mladenp
Speed up SSMS development: www.ssmstoolspack.com <- version 1.1 out!

kfj_de
Starting Member

12 Posts

Posted - 2008-10-08 : 12:29:38

There are 8232 values, that are organized in an array of 168 rows and 49 columns. A set of measurement values, that we have to use for the charts and what will result in the queries of that project, can use values from one or many rows and can also only use parts of the rows. So we will perhaps check programmatically, whether the values in Row 76 Column 45, 46 and 47 are within a given range or we are calculating the average values out of Row 101 and 102 columns 3 to 42.

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-10-08 : 12:40:43

If so, replace colNol column with two columns in MeasurementDatas
colNum TINYINT NOT NULL
rowNum TINYINT NOT NULL

E 12°55'05.63"
N 56°04'39.26"

kfj_de
Starting Member

12 Posts

Posted - 2008-10-08 : 12:47:47

Thank you very much. Your ideas really make sense and I am sure they will help us a lot.

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2008-10-08 : 12:51:49

wait,wait,wait.

if you measuring device returns to you a file with for example 100 rows and each row has 49 values split by a ;.
do those 49 values all come from the same sensor or do they come from different sensors?
for example do they all come from measuring temperature on the same spot over time?
or doeas each value of those 49 represent a temperature measurement from a different heat source?

those 2 scenarions require very different table structures.

_______________________________________________
Causing trouble since 1980
Blog: http://weblogs.sqlteam.com/mladenp
Speed up SSMS development: www.ssmstoolspack.com <- version 1.1 out!

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-10-08 : 13:19:55

How come, if you store all values with the physical place in file, row and col?

E 12°55'05.63"
N 56°04'39.26"

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2008-10-08 : 13:40:40

if that's a reply to me, i have no clue what you mean, peter.

_______________________________________________
Causing trouble since 1980
Blog: http://weblogs.sqlteam.com/mladenp
Speed up SSMS development: www.ssmstoolspack.com <- version 1.1 out!

kfj_de
Starting Member

12 Posts

Posted - 2008-10-08 : 13:46:22

Each value represents a measurement. All the measurements are put together by an application we do not have influence on. This application produces a .txt-file which is tab delimited. The file is an array of values. The Rows represent some kind of different places and Columns represent some kind of different attributes.

What we are doing with the values is to use predefined formulas against them and based on that result give a feedback to the uploader whether everything is o.k. with the unit or whether he has to do something.

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2008-10-08 : 13:52:32

well if all of the values do NOT represent the same measuerd value then you have to use col1 to colN design.

_______________________________________________
Causing trouble since 1980
Blog: http://weblogs.sqlteam.com/mladenp
Speed up SSMS development: www.ssmstoolspack.com <- version 1.1 out!

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-10-08 : 13:58:08

How will you then solve the scenario when you want to take the average of row 56-61 and columns 23-29?
Or the average of row 5 and columns 12-33?
Or the average of row 101-113 and column 17?

This will solve any of these sitations.

CREATE TABLE	MeasurementDatas
		(
			MeasurementID INT NOT NULL,
			RowNum TINYINT NOT NULL,
			ColNum TINYINT NOT NULL,
			CellValue SMALLMONEY NOT NULL
		)

SELECT  AVG(CellValue)
FROM    MeasurementDatas
WHERE   MeasurementID = @MeasurementID
        AND RowNum BETWEEN @LowRow AND @HighRow
        AND ColNum BETWEEN @LowCol AND @HighCol

[/code]

E 12°55'05.63"
N 56°04'39.26"

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2008-10-08 : 14:03:25

yeah that'd work too

very nice.

_______________________________________________
Causing trouble since 1980
Blog: http://weblogs.sqlteam.com/mladenp
Speed up SSMS development: www.ssmstoolspack.com <- version 1.1 out!

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-10-08 : 14:18:46

Sorry I didn't make that clearer earlier.

E 12°55'05.63"
N 56°04'39.26"

Next Page

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources