Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
sqlnoobee
Starting Member
7 Posts |
Posted - 2006-09-25 : 12:14:24
|
| Hi all,This is my first post so forgive me if I make mistakes. The issue I'd like some feedback on is designing database table structures for storing test data that consists of multiple graphs (as in MS Excel Graph) with multiple plots of 2000 pts (x,y) or more per plot . The data is collected periodically. For example for one test, a graph may be generated every 15 seconds for 10 days.As you can imagine this collection system will store a large amount of data. I'd like to ensure that DB is designed for efficient storage and retrieval of the data. The attached image of a sql diagram shows one design strategy. However, I am worried that even with indexes,clustering etc. the point table will become unmanageable after a short period of time.I realize this is a very complex topic (indexes, clustering, etc), but if anyone can point me in the right direction to other articles or favorite references. For instance, is this a good table structure or should I switch to a different datatype and store the each plot as a CSV in a varchar column, or as an XML datatype, or binary. Thanks for any help in advance.IMAGE LINKS (not sure how long they will last)Example Data:http://img179.imageshack.us/img179/9414/graphsuk6.jpgSQL Diagramhttp://img179.imageshack.us/img179/4585/examplediagramkn2.jpgSQL SCRIPT (for quick attempt at structure)SET ANSI_NULLS ONGOSET QUOTED_IDENTIFIER ONGOIF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Test]') AND type in (N'U'))BEGINCREATE TABLE [dbo].[Test]( [TestID] [int] NOT NULL, [TestInfo] [varchar](50) NULL, CONSTRAINT [PK_Test] PRIMARY KEY CLUSTERED ( [TestID] ASC)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]) ON [PRIMARY]ENDGOSET ANSI_NULLS ONGOSET QUOTED_IDENTIFIER ONGOIF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Sample]') AND type in (N'U'))BEGINCREATE TABLE [dbo].[Sample]( [SampleID] [int] NOT NULL, [TestID] [int] NOT NULL, [SampleTime] [datetime] NOT NULL, [SampleInfo] [varchar](50) NOT NULL, CONSTRAINT [PK_Sample] PRIMARY KEY CLUSTERED ( [SampleID] ASC)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]) ON [PRIMARY]ENDGOSET ANSI_NULLS ONGOSET QUOTED_IDENTIFIER ONGOIF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Plot]') AND type in (N'U'))BEGINCREATE TABLE [dbo].[Plot]( [PlotID] [int] NOT NULL, [SampleID] [int] NULL, [PlotIndex] [int] NULL, [PlotName] [varchar](50) NULL, CONSTRAINT [PK_Plot] PRIMARY KEY CLUSTERED ( [PlotID] ASC)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]) ON [PRIMARY]ENDGOSET ANSI_NULLS ONGOSET QUOTED_IDENTIFIER ONGOIF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Point]') AND type in (N'U'))BEGINCREATE TABLE [dbo].[Point]( [PointID] [bigint] NOT NULL, [PlotID] [int] NOT NULL, [PointNumber] [int] NOT NULL, [X] [numeric](18, 5) NOT NULL, [Y] [numeric](18, 5) NOT NULL, CONSTRAINT [PK_Point] PRIMARY KEY CLUSTERED ( [PointID] ASC)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]) ON [PRIMARY]ENDGOIF NOT EXISTS (SELECT * FROM sys.foreign_keys WHERE object_id = OBJECT_ID(N'[dbo].[FK_Sample_Test]') AND parent_object_id = OBJECT_ID(N'[dbo].[Sample]'))ALTER TABLE [dbo].[Sample] WITH CHECK ADD CONSTRAINT [FK_Sample_Test] FOREIGN KEY([TestID])REFERENCES [dbo].[Test] ([TestID])GOIF NOT EXISTS (SELECT * FROM sys.foreign_keys WHERE object_id = OBJECT_ID(N'[dbo].[FK_Plot_Sample]') AND parent_object_id = OBJECT_ID(N'[dbo].[Plot]'))ALTER TABLE [dbo].[Plot] WITH CHECK ADD CONSTRAINT [FK_Plot_Sample] FOREIGN KEY([SampleID])REFERENCES [dbo].[Sample] ([SampleID])GOIF NOT EXISTS (SELECT * FROM sys.foreign_keys WHERE object_id = OBJECT_ID(N'[dbo].[FK_Point_Plot]') AND parent_object_id = OBJECT_ID(N'[dbo].[Point]'))ALTER TABLE [dbo].[Point] WITH CHECK ADD CONSTRAINT [FK_Point_Plot] FOREIGN KEY([PlotID])REFERENCES [dbo].[Plot] ([PlotID]) |
|
|
snSQL
Master Smack Fu Yak Hacker
1837 Posts |
Posted - 2006-09-25 : 13:16:11
|
| It looks good to me.I'm usually in favor of having an internal primary key as you have done in all the tables, but if the PointNumber in the Point table is a sequence number that will be unique within each plot, I'd say that you may want to consider having a compound primary key of PlotId, PointNumber and making that the clustered index on that table. That will remove the need for a separate indexes for the primary key and the PlotId, PlotNumber columns and your access to Point should be very fast. |
 |
|
|
sqlnoobee
Starting Member
7 Posts |
Posted - 2006-09-25 : 13:23:42
|
quote: It looks good to me.I'm usually in favor of having an internal primary key as you have done in all the tables, but if the PointNumber in the Point table is a sequence number that will be unique within each plot, I'd say that you may want to consider having a compound primary key of PlotId, PointNumber and making that the clustered index on that table. That will remove the need for a separate indexes for the primary key and the PlotId, PlotNumber columns and your access to Point should be very fast.
Good points. I was also considering a compond primary key also, just for the fact that I was scared I would run out of room in the bigint pk for points tbl. Based on what you said, I assume there is no performance penalty for a compound pk and in fact may be a performance increase? I also assume the size of the table will be reduced significantly by removing the big int PK. I know assumptions in general are dangerous, but are my correct for this situation. |
 |
|
|
snSQL
Master Smack Fu Yak Hacker
1837 Posts |
Posted - 2006-09-25 : 14:30:29
|
quote: I was scared I would run out of room in the bigint pk for points tbl.
I wouldn't worry about that - if you captured 1 million points per second it would take you more than 292,000 years to fill a bigint! A bigint has a max value of 9223372036854775807. On that note, at one point per second, it would take you 68 years to fill up an int (max value = 2,147,483,647) quote: I assume there is no performance penalty for a compound pk
That depends on the size of the compound key - if you were using a long character column in the primary key, then it would affect performance because the index values would be much larger, but in your case the combined columns are only 8 bytes wide (two four byte ints), so should be no difference of that over one bigint (8 bytes). quote: I also assume the size of the table will be reduced significantly by removing the big int PK.
You'll have 8 bytes less per row without the bigint, so multiply that by the number of rows you anticipate - it works out to about 8MB per million rows which isn't very much in database terms, but not insignificant either.Also, I mainly suggested a compound primary key in this case because you will almost always be processing this data in ranges of points based on their PlotId and using a compound primary key with a clustered index will let SQL Server directly retrieve the rows from contiguous database pages without referring to another index. |
 |
|
|
|
|
|
|
|