DB Design Suggestions for Storing Large Graphs

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

SQL Server Development (2000)

DB Design Suggestions for Storing Large Graphs

Author

Topic

sqlnoobee
Starting Member

7 Posts

Posted - 2006-09-25 : 12:14:24

Hi all,

This is my first post so forgive me if I make mistakes. The issue I'd like some feedback on is designing database table structures for storing test data that consists of multiple graphs (as in MS Excel Graph) with multiple plots of 2000 pts (x,y) or more per plot . The data is collected periodically. For example for one test, a graph may be generated every 15 seconds for 10 days.

As you can imagine this collection system will store a large amount of data. I'd like to ensure that DB is designed for efficient storage and retrieval of the data.

The attached image of a sql diagram shows one design strategy. However, I am worried that even with indexes,clustering etc. the point table will become unmanageable after a short period of time.

I realize this is a very complex topic (indexes, clustering, etc), but if anyone can point me in the right direction to other articles or favorite references. For instance, is this a good table structure or should I switch to a different datatype and store the each plot as a CSV in a varchar column, or as an XML datatype, or binary.

Thanks for any help in advance.

IMAGE LINKS (not sure how long they will last)
Example Data:
http://img179.imageshack.us/img179/9414/graphsuk6.jpg

SQL Diagram
http://img179.imageshack.us/img179/4585/examplediagramkn2.jpg

SQL SCRIPT (for quick attempt at structure)

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Test]') AND type in (N'U'))
BEGIN
CREATE TABLE [dbo].[Test](
[TestID] [int] NOT NULL,
[TestInfo] [varchar](50) NULL,
CONSTRAINT [PK_Test] PRIMARY KEY CLUSTERED
(
[TestID] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
END
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Sample]') AND type in (N'U'))
BEGIN
CREATE TABLE [dbo].[Sample](
[SampleID] [int] NOT NULL,
[TestID] [int] NOT NULL,
[SampleTime] [datetime] NOT NULL,
[SampleInfo] [varchar](50) NOT NULL,
CONSTRAINT [PK_Sample] PRIMARY KEY CLUSTERED
(
[SampleID] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
END
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Plot]') AND type in (N'U'))
BEGIN
CREATE TABLE [dbo].[Plot](
[PlotID] [int] NOT NULL,
[SampleID] [int] NULL,
[PlotIndex] [int] NULL,
[PlotName] [varchar](50) NULL,
CONSTRAINT [PK_Plot] PRIMARY KEY CLUSTERED
(
[PlotID] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
END
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[Point]') AND type in (N'U'))
BEGIN
CREATE TABLE [dbo].[Point](
[PointID] [bigint] NOT NULL,
[PlotID] [int] NOT NULL,
[PointNumber] [int] NOT NULL,
[X] [numeric](18, 5) NOT NULL,
[Y] [numeric](18, 5) NOT NULL,
CONSTRAINT [PK_Point] PRIMARY KEY CLUSTERED
(
[PointID] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
END
GO
IF NOT EXISTS (SELECT * FROM sys.foreign_keys WHERE object_id = OBJECT_ID(N'[dbo].[FK_Sample_Test]') AND parent_object_id = OBJECT_ID(N'[dbo].[Sample]'))
ALTER TABLE [dbo].[Sample] WITH CHECK ADD CONSTRAINT [FK_Sample_Test] FOREIGN KEY([TestID])
REFERENCES [dbo].[Test] ([TestID])
GO
IF NOT EXISTS (SELECT * FROM sys.foreign_keys WHERE object_id = OBJECT_ID(N'[dbo].[FK_Plot_Sample]') AND parent_object_id = OBJECT_ID(N'[dbo].[Plot]'))
ALTER TABLE [dbo].[Plot] WITH CHECK ADD CONSTRAINT [FK_Plot_Sample] FOREIGN KEY([SampleID])
REFERENCES [dbo].[Sample] ([SampleID])
GO
IF NOT EXISTS (SELECT * FROM sys.foreign_keys WHERE object_id = OBJECT_ID(N'[dbo].[FK_Point_Plot]') AND parent_object_id = OBJECT_ID(N'[dbo].[Point]'))
ALTER TABLE [dbo].[Point] WITH CHECK ADD CONSTRAINT [FK_Point_Plot] FOREIGN KEY([PlotID])
REFERENCES [dbo].[Plot] ([PlotID])

snSQL
Master Smack Fu Yak Hacker

1837 Posts

Posted - 2006-09-25 : 13:16:11

It looks good to me.

I'm usually in favor of having an internal primary key as you have done in all the tables, but if the PointNumber in the Point table is a sequence number that will be unique within each plot, I'd say that you may want to consider having a compound primary key of PlotId, PointNumber and making that the clustered index on that table. That will remove the need for a separate indexes for the primary key and the PlotId, PlotNumber columns and your access to Point should be very fast.

sqlnoobee
Starting Member

7 Posts

Posted - 2006-09-25 : 13:23:42

quote:
It looks good to me.

I'm usually in favor of having an internal primary key as you have done in all the tables, but if the PointNumber in the Point table is a sequence number that will be unique within each plot, I'd say that you may want to consider having a compound primary key of PlotId, PointNumber and making that the clustered index on that table. That will remove the need for a separate indexes for the primary key and the PlotId, PlotNumber columns and your access to Point should be very fast.

Good points. I was also considering a compond primary key also, just for the fact that I was scared I would run out of room in the bigint pk for points tbl. Based on what you said, I assume there is no performance penalty for a compound pk and in fact may be a performance increase? I also assume the size of the table will be reduced significantly by removing the big int PK. I know assumptions in general are dangerous, but are my correct for this situation.

snSQL
Master Smack Fu Yak Hacker

1837 Posts

Posted - 2006-09-25 : 14:30:29

quote:
I was scared I would run out of room in the bigint pk for points tbl.

I wouldn't worry about that - if you captured 1 million points per second it would take you more than 292,000 years to fill a bigint! A bigint has a max value of 9223372036854775807.
On that note, at one point per second, it would take you 68 years to fill up an int (max value = 2,147,483,647)

quote:
I assume there is no performance penalty for a compound pk

That depends on the size of the compound key - if you were using a long character column in the primary key, then it would affect performance because the index values would be much larger, but in your case the combined columns are only 8 bytes wide (two four byte ints), so should be no difference of that over one bigint (8 bytes).

quote:
I also assume the size of the table will be reduced significantly by removing the big int PK.

You'll have 8 bytes less per row without the bigint, so multiply that by the number of rows you anticipate - it works out to about 8MB per million rows which isn't very much in database terms, but not insignificant either.

Also, I mainly suggested a compound primary key in this case because you will almost always be processing this data in ranges of points based on their PlotId and using a compound primary key with a clustered index will let SQL Server directly retrieve the rows from contiguous database pages without referring to another index.

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources