Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 Database Design and Application Architecture
 Best way to store scientific measurement data in d

Author  Topic 

calle.liedberg
Starting Member

4 Posts

Posted - 2011-02-04 : 08:20:07
Hi guys.

Need some input from you experts on a project I´m working on.

Some background:

I am going to develop a database for a company that does different experiments on road surfaces using a variety of test equipment.

The equipments currently have very different ways of control such as old Win95 or NT applications. The idea is to develop new software for the equipment one at a time and integrate all equipment in a common application and database structure. The equipment will after updrade typically use C# WPF or WinForms applications to control the experiments and store test information such as: test parameters, experiment specifics and, of course most importantly, the test results in a database.

The problem to solve:
The idea is to have a common database structure that could be used for all equipments. I do not want to create new tables of the same structure for each new equipment.

I think I have everything sorted out except how to store the measurement data in an efficient way.

The measurement data from an experiment typically has about between 10000 to 1000000 datapoints where each datapoint looks has the structure:

datapoint;
timestamp;
measure parameter1;
measure parameter2;
measure parameter3;
etc...

The exact structure of a measurement data differs between equipments and also between different experiments and setups on the same equipment.

My idea is to just have a single Results-table for all equipments and experiments, something like this:
[dbo].[Results]
(
[Id] [int],
[TestId] [int],
[Data] [datatype???]
)

and just link this to a test-table.

Now for the million dollar question:
How should I design this [Data] to store the information???

I do not have the need to be able to search the information in the [Data] or in other ways manipulate it. I just have to store it and then get it back to the software of the equipment when they request it.

I do not think that storing it binary from the software using binary serialization in .NET is a viable approach since the software continously will be updated on and then de-serialize it could be an issue when having different versions of the software.

Storing it as xml is maybe a possibility but as some people have said to me xml files tend to get pretty large. Or am I wrong here?

Also, I have the possiblity to choose whether to use a 2005 or 2008 MS SQL server so if there is any new features in 2008 that can be leveraged that is not a problem.

Thanks
/Calle

jezemine
Master Smack Fu Yak Hacker

2886 Posts

Posted - 2011-02-04 : 08:27:05
can you give examples of the kind of data the measurements produce. it's hard to know what datatype is best without knowing what the data is.


elsasoft.org
Go to Top of Page

calle.liedberg
Starting Member

4 Posts

Posted - 2011-02-04 : 08:45:00
The measurement data consists of datapoint where each datapoint has a number of parameters such as a timestamp and several parameter values. If you look at it in a table-like form it has the columns:

[datapointID]
[timestamp]
[parameter value 1]
[parameter value 2]
[parameter value 3]
[parameter value X... etc.]

for example:
id timestamp laservalue temperature pressure
1 20110204 14:00:00:001 0,4567 20,5 1,001
2 20110204 14:00:00:020 0,2345 20,7 1,003
3 20110204 14:00:00:032 0,3456 20,7 1,003
4 20110204 14:00:00:045 0,4567 20,8 1,023
5 20110204 14:00:00:060 0,5678 20,7 1,055

The above is just an example and different experiments can have different parameters.

The number of points can be up to about 1 000 000 or even bigger.

Go to Top of Page

Lumbago
Norsk Yak Master

3271 Posts

Posted - 2011-02-04 : 09:14:36
Well...as far as I can see your issue here is what data type to use. In 2008 there is something called sparse columns which (for varchar columns) cuts the storage size by 60% for NULL columns, so you could create a table with say 100 sparse varchar columns (or 1000 for that matter) that will serve as your parameters (param1, param2, ..., param100). Then maybe you could have another table that holds metadata for each test and what each parameter actually represent.

By expanding your example it could look like this maybe? ->

id testid timestamp param1 param2 param3
1 51452 20110204 14:00:00:001 0,4567 20,5 1,001
2 51452 20110204 14:00:00:020 0,2345 20,7 1,003
3 51452 20110204 14:00:00:032 0,3456 20,7 1,003
4 51452 20110204 14:00:00:045 0,4567 20,8 1,023
5 51452 20110204 14:00:00:060 0,5678 20,7 1,055

id testid paramname paramdescription
1 51452 param1 laservalue
2 51452 param2 temperature
3 51452 param3 pressure


Does this make any sense?

- Lumbago
My blog-> www.thefirstsql.com
Go to Top of Page

calle.liedberg
Starting Member

4 Posts

Posted - 2011-02-04 : 11:50:30
@Lumbago.

In the link below I have made a database diagram to try to illustrate how it looks.

[url]http://dl.dropbox.com/u/121397/database.jpg[/url]

Look especially on the Results-table and the Tests-table.
It is in the dbo.Results.Data column I would like to store the measurement data "file".

I do not want to store the data in a regular database-table-fashion since the number of rows in this table will rise to hundreds of millions of rows within a week or two (each experiment can have around a million of datapoints. Also I do not have the need for beeing able to sort, search, filter or in any other way manipulate the data once it is in the database. I just want to store it and pull it out again when I need it.

I'm looking for some kind of way to store the data as a "file" in the dbo.Results.Data column. Kind of like saving a document or something in a database. I just don't know in what format this "file" should be in. Binary serialized isn't an option I think but perhaps xml.

Go to Top of Page

jezemine
Master Smack Fu Yak Hacker

2886 Posts

Posted - 2011-02-04 : 14:29:57
if you want to store it as a file, you might want to look at the FILESTREAM datatype in SQL 2008.

http://technet.microsoft.com/en-us/library/bb933993.aspx

what you put in there can be any format you like, whatever is convenient. .dat, .pdf, .gif, .txt, whatever.


elsasoft.org
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2011-02-04 : 15:01:01
Why do you need a database at all, if you not are going to search the data?
At some point you are going to retreive the data, as you wrote yourself.

1) Are you planning to use Standard Edition or Enterprise Edition?
2) Is it "all or nothing" scenario when retrieving/restoring the data?



N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2011-02-04 : 15:03:12
Just make a note that you cannot do a "SELECT *" over sparse columns.
You will have to specify the columns you want to return.
Or... You can make a columnset column which returns a xml document for all included elements/columns.


N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2011-02-04 : 15:21:57
Sparse column seems to be the way to go.
CREATE TABLE	dbo.EquipmentType
(
EquipmentTypeID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
EquipmentCode VARCHAR(8) NOT NULL,
EquipmentName VARCHAR(48) NOT NULL
)

INSERT dbo.EquipmentType
(
EquipmentCode,
EquipmentName
)
VALUES ('Laser', 'Laserguided road inspection')

CREATE TABLE dbo.Measure
(
MeasureID INT IDENTITY(-2147483647, 1) PRIMARY KEY CLUSTERED,
EquipmentTypeID INT REFERENCES dbo.EquipmentType(EquipmentTypeID),
ID INT NOT NULL,
TimeStamp DATETIME2(3) NOT NULL,
AllMeasure XML COLUMN_SET FOR ALL_SPARSE_COLUMNS NULL,
LaserValue DECIMAL(5, 2) SPARSE NULL,
Temperature DECIMAL(3, 1) SPARSE NULL,
Pressure DECIMAL(4, 3) SPARSE NULL
)

-- Option 1
INSERT dbo.Measure
(
EquipmentTypeID,
ID,
TimeStamp,
LaserValue,
Temperature,
Pressure
)
VALUES (1, 1, '20110204 14:00:00:001', 0.4567, 20.5, 1.001),
(1, 2, '20110204 14:00:00:020', 0.2345, 20.7, 1.003)

SELECT *
FROM dbo.Measure

SELECT MeasureID,
EquipmentTypeID,
ID,
TimeStamp,
LaserValue,
Temperature,
Pressure
FROM dbo.Measure

-- Option 2
INSERT dbo.Measure
(
EquipmentTypeID,
ID,
TimeStamp,
AllMeasure
)
VALUES (1, 3, '20110204 14:00:00:032', '<LaserValue>0.3456</LaserValue><Temperature>20.7</Temperature><Pressure>1.003</Pressure>'),
(1, 4, '20110204 14:00:00:045', '<LaserValue>0.4567</LaserValue><Temperature>20.8</Temperature><Pressure>1.023</Pressure>'),
(1, 5, '20110204 14:00:00:060', '<LaserValue>0.5678</LaserValue><Temperature>20.7</Temperature><Pressure>1.055</Pressure>')

SELECT *
FROM dbo.Measure

SELECT MeasureID,
EquipmentTypeID,
ID,
TimeStamp,
LaserValue,
Temperature,
Pressure
FROM dbo.Measure

DROP TABLE dbo.Measure,
dbo.EquipmentType



N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2011-02-04 : 15:23:31
And if you need a new sparse column, just issue this command
ALTER TABLE dbo.Measure ADD Angle DECIMAL(6, 3) SPARSE NULL
and this column is automatically added to the AllMeasure column!



N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

calle.liedberg
Starting Member

4 Posts

Posted - 2011-02-06 : 13:49:53
Thanks for the input.

@Peso.

I think the sql server edition is Enterprice but not entirely sure. Does it matter much?

The reason for a database is so to have a single point where the data is stored. As it is now the researchers sends files back and forth between each other and no one really has any overview of what experiments has been done before etc. The "base"-data should be available for all researchers in the basic form. The idea is to make a web interface, e.g ASP.Net or Silverlight where they can search for experiments and when they find one they want to investigate further they can extract the data in the format the want (e.g. excel or CSV etc.).

I will look in to the XML columnset approach and the sparse columns. Is this an efficient way of storing the data or will the "Measure"-table grow in size very fast?

Thanks for your input guys.

/Calle
Go to Top of Page
   

- Advertisement -