Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
calle.liedberg
Starting Member
4 Posts |
Posted - 2011-02-04 : 08:20:07
|
Hi guys.Need some input from you experts on a project I´m working on.Some background:I am going to develop a database for a company that does different experiments on road surfaces using a variety of test equipment.The equipments currently have very different ways of control such as old Win95 or NT applications. The idea is to develop new software for the equipment one at a time and integrate all equipment in a common application and database structure. The equipment will after updrade typically use C# WPF or WinForms applications to control the experiments and store test information such as: test parameters, experiment specifics and, of course most importantly, the test results in a database.The problem to solve:The idea is to have a common database structure that could be used for all equipments. I do not want to create new tables of the same structure for each new equipment.I think I have everything sorted out except how to store the measurement data in an efficient way.The measurement data from an experiment typically has about between 10000 to 1000000 datapoints where each datapoint looks has the structure:datapoint;timestamp;measure parameter1;measure parameter2;measure parameter3;etc...The exact structure of a measurement data differs between equipments and also between different experiments and setups on the same equipment.My idea is to just have a single Results-table for all equipments and experiments, something like this:[dbo].[Results]([Id] [int],[TestId] [int],[Data] [datatype???])and just link this to a test-table.Now for the million dollar question:How should I design this [Data] to store the information???I do not have the need to be able to search the information in the [Data] or in other ways manipulate it. I just have to store it and then get it back to the software of the equipment when they request it.I do not think that storing it binary from the software using binary serialization in .NET is a viable approach since the software continously will be updated on and then de-serialize it could be an issue when having different versions of the software.Storing it as xml is maybe a possibility but as some people have said to me xml files tend to get pretty large. Or am I wrong here?Also, I have the possiblity to choose whether to use a 2005 or 2008 MS SQL server so if there is any new features in 2008 that can be leveraged that is not a problem.Thanks/Calle |
|
jezemine
Master Smack Fu Yak Hacker
2886 Posts |
Posted - 2011-02-04 : 08:27:05
|
can you give examples of the kind of data the measurements produce. it's hard to know what datatype is best without knowing what the data is. elsasoft.org |
|
|
calle.liedberg
Starting Member
4 Posts |
Posted - 2011-02-04 : 08:45:00
|
The measurement data consists of datapoint where each datapoint has a number of parameters such as a timestamp and several parameter values. If you look at it in a table-like form it has the columns:[datapointID][timestamp][parameter value 1][parameter value 2][parameter value 3][parameter value X... etc.]for example:id timestamp laservalue temperature pressure1 20110204 14:00:00:001 0,4567 20,5 1,0012 20110204 14:00:00:020 0,2345 20,7 1,0033 20110204 14:00:00:032 0,3456 20,7 1,0034 20110204 14:00:00:045 0,4567 20,8 1,0235 20110204 14:00:00:060 0,5678 20,7 1,055The above is just an example and different experiments can have different parameters.The number of points can be up to about 1 000 000 or even bigger. |
|
|
Lumbago
Norsk Yak Master
3271 Posts |
Posted - 2011-02-04 : 09:14:36
|
Well...as far as I can see your issue here is what data type to use. In 2008 there is something called sparse columns which (for varchar columns) cuts the storage size by 60% for NULL columns, so you could create a table with say 100 sparse varchar columns (or 1000 for that matter) that will serve as your parameters (param1, param2, ..., param100). Then maybe you could have another table that holds metadata for each test and what each parameter actually represent. By expanding your example it could look like this maybe? ->id testid timestamp param1 param2 param31 51452 20110204 14:00:00:001 0,4567 20,5 1,0012 51452 20110204 14:00:00:020 0,2345 20,7 1,0033 51452 20110204 14:00:00:032 0,3456 20,7 1,0034 51452 20110204 14:00:00:045 0,4567 20,8 1,0235 51452 20110204 14:00:00:060 0,5678 20,7 1,055id testid paramname paramdescription1 51452 param1 laservalue2 51452 param2 temperature3 51452 param3 pressure Does this make any sense?- LumbagoMy blog-> www.thefirstsql.com |
|
|
calle.liedberg
Starting Member
4 Posts |
Posted - 2011-02-04 : 11:50:30
|
@Lumbago.In the link below I have made a database diagram to try to illustrate how it looks.[url]http://dl.dropbox.com/u/121397/database.jpg[/url]Look especially on the Results-table and the Tests-table.It is in the dbo.Results.Data column I would like to store the measurement data "file".I do not want to store the data in a regular database-table-fashion since the number of rows in this table will rise to hundreds of millions of rows within a week or two (each experiment can have around a million of datapoints. Also I do not have the need for beeing able to sort, search, filter or in any other way manipulate the data once it is in the database. I just want to store it and pull it out again when I need it.I'm looking for some kind of way to store the data as a "file" in the dbo.Results.Data column. Kind of like saving a document or something in a database. I just don't know in what format this "file" should be in. Binary serialized isn't an option I think but perhaps xml. |
|
|
jezemine
Master Smack Fu Yak Hacker
2886 Posts |
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2011-02-04 : 15:01:01
|
Why do you need a database at all, if you not are going to search the data?At some point you are going to retreive the data, as you wrote yourself.1) Are you planning to use Standard Edition or Enterprise Edition?2) Is it "all or nothing" scenario when retrieving/restoring the data? N 56°04'39.26"E 12°55'05.63" |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2011-02-04 : 15:03:12
|
Just make a note that you cannot do a "SELECT *" over sparse columns. You will have to specify the columns you want to return.Or... You can make a columnset column which returns a xml document for all included elements/columns. N 56°04'39.26"E 12°55'05.63" |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2011-02-04 : 15:21:57
|
Sparse column seems to be the way to go.CREATE TABLE dbo.EquipmentType ( EquipmentTypeID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED, EquipmentCode VARCHAR(8) NOT NULL, EquipmentName VARCHAR(48) NOT NULL )INSERT dbo.EquipmentType ( EquipmentCode, EquipmentName )VALUES ('Laser', 'Laserguided road inspection')CREATE TABLE dbo.Measure ( MeasureID INT IDENTITY(-2147483647, 1) PRIMARY KEY CLUSTERED, EquipmentTypeID INT REFERENCES dbo.EquipmentType(EquipmentTypeID), ID INT NOT NULL, TimeStamp DATETIME2(3) NOT NULL, AllMeasure XML COLUMN_SET FOR ALL_SPARSE_COLUMNS NULL, LaserValue DECIMAL(5, 2) SPARSE NULL, Temperature DECIMAL(3, 1) SPARSE NULL, Pressure DECIMAL(4, 3) SPARSE NULL )-- Option 1INSERT dbo.Measure ( EquipmentTypeID, ID, TimeStamp, LaserValue, Temperature, Pressure )VALUES (1, 1, '20110204 14:00:00:001', 0.4567, 20.5, 1.001), (1, 2, '20110204 14:00:00:020', 0.2345, 20.7, 1.003)SELECT *FROM dbo.MeasureSELECT MeasureID, EquipmentTypeID, ID, TimeStamp, LaserValue, Temperature, PressureFROM dbo.Measure-- Option 2INSERT dbo.Measure ( EquipmentTypeID, ID, TimeStamp, AllMeasure )VALUES (1, 3, '20110204 14:00:00:032', '<LaserValue>0.3456</LaserValue><Temperature>20.7</Temperature><Pressure>1.003</Pressure>'), (1, 4, '20110204 14:00:00:045', '<LaserValue>0.4567</LaserValue><Temperature>20.8</Temperature><Pressure>1.023</Pressure>'), (1, 5, '20110204 14:00:00:060', '<LaserValue>0.5678</LaserValue><Temperature>20.7</Temperature><Pressure>1.055</Pressure>')SELECT *FROM dbo.MeasureSELECT MeasureID, EquipmentTypeID, ID, TimeStamp, LaserValue, Temperature, PressureFROM dbo.MeasureDROP TABLE dbo.Measure, dbo.EquipmentType N 56°04'39.26"E 12°55'05.63" |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2011-02-04 : 15:23:31
|
And if you need a new sparse column, just issue this commandALTER TABLE dbo.Measure ADD Angle DECIMAL(6, 3) SPARSE NULL and this column is automatically added to the AllMeasure column! N 56°04'39.26"E 12°55'05.63" |
|
|
calle.liedberg
Starting Member
4 Posts |
Posted - 2011-02-06 : 13:49:53
|
Thanks for the input.@Peso.I think the sql server edition is Enterprice but not entirely sure. Does it matter much?The reason for a database is so to have a single point where the data is stored. As it is now the researchers sends files back and forth between each other and no one really has any overview of what experiments has been done before etc. The "base"-data should be available for all researchers in the basic form. The idea is to make a web interface, e.g ASP.Net or Silverlight where they can search for experiments and when they find one they want to investigate further they can extract the data in the format the want (e.g. excel or CSV etc.).I will look in to the XML columnset approach and the sparse columns. Is this an efficient way of storing the data or will the "Measure"-table grow in size very fast?Thanks for your input guys./Calle |
|
|
|
|
|
|
|