Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
Humate
Posting Yak Master
101 Posts |
Posted - 2010-11-16 : 11:55:34
|
Hi All,I have a problem relating to ideal structure for querying my data. I currently have tables that can link on two different keys. Depending on what system has been used for a particular client, they could have an ID, ID2, or both. This data is consolidated in the same table.I will be running queries on millions of records, so would like to establish best practice for dealing with this type of situation. I want to prevent long running queries, but not sure if something like a concatenated key is the answer i.e. a new column for ID (int)+ ID2(varchar) = UniqueKey. I am aware this could be a huge overhead as an index though.Or perhaps I just use some creative queries to deal with each join and UNION the results? Example below shows how records can relate to actions with 3 different combinations of ID, ID2:CREATE TABLE #testrecords( [Name] NCHAR(50) NULL, [ID] INT NULL, [ID2] NVARCHAR (15) NULL) CREATE TABLE #testactions( [ID] INT NULL, [ID2] NVARCHAR (15) NULL, [Action] NCHAR(50) NOT NULL, [ActionDate] DATETIME NOT NULL) INSERT INTO #testrecordsSELECT 'Bob', 1, NULL UNION ALLSELECT 'Rob', NULL, 'ID00000001' UNION ALLSELECT 'Dave', 2, 'ID00000009'INSERT INTO #testactionsSELECT 1, NULL, 'Account Opened', GETDATE() UNION ALLSELECT 1, NULL, 'Order Placed', GETDATE() UNION ALLSELECT NULL, 'ID00000001', 'Account Opened', GETDATE() UNION ALLSELECT 2, 'ID00000009', 'Account Opened', GETDATE()SELECT * FROM #testrecordsSELECT * FROM #testactionsDROP TABLE #testrecordsDROP TABLE #testactions |
|
Lamprey
Master Smack Fu Yak Hacker
4614 Posts |
Posted - 2010-11-16 : 12:08:10
|
Does something like this work? SELECT *FROM #testrecords AS RINNER JOIN #testactions AS A ON (R.ID = A.ID OR (R.ID IS NULL AND A.ID IS NULL)) AND (R.ID2 = A.ID2 OR (R.ID2 IS NULL AND A.ID2 IS NULL)) Another option is to add another column to indicate which system they come from and then join accordingly. |
 |
|
Humate
Posting Yak Master
101 Posts |
Posted - 2010-11-16 : 12:22:47
|
yes they do work, however I think the OR queries seem to be very resource hungry and time consuming. I have tried a CASE statement on the system too before now, which seemed to perform better, and an inner join on both keys with a UNION the fastest.I probably need to time the various ways this could be written to find the exact query times, but was wondering if I should really be doing something with the keys/columns/table structure instead of querying in this way. |
 |
|
Lamprey
Master Smack Fu Yak Hacker
4614 Posts |
Posted - 2010-11-16 : 12:30:05
|
Not fully knowing your data, I'd guess that you should create a composite key based on the Source System and the Source System Key. Then it's just a simple join clause based on two columns. |
 |
|
jcelko
Esteemed SQL Purist
547 Posts |
Posted - 2010-11-16 : 14:54:44
|
I hope that you actually know that tables need keys, need DRI and how to name data elements. That magical generic "id" is totally wrong, and that NULLs are not spread over the schema like mustard, in spite of what you posted. We do not link in RDBMS; we REFERENCE and JOIN. Big difference! No more explicit pointer chains. But this mess is so ugly, I have to fix it. The weird generic id's are a real design problem; only test_name can be the key you left off: CREATE TABLE TestRows(test_name CHAR(50) NOT NULL PRIMARY KEY,-- only choice and too long id INTEGER, id2 VARCHAR (15), CHECK (CAST(id AS CHAR(20) + id2 IS NOT NULL); Can I assume the next table also suffers the same design problem with id and id2? That also forces us to one possible key, the something_action. But that does not work.CREATE TABLE TestActions (id INTEGER NOT NULL, id2 VARCHAR (15) NOT NULL, something_action NCHAR(50) NOT NULL, something_action_date DATE DEFAULT CURRENT_TIMESTAMP NOT NULL);SQL Server now has real ANSI/ISO syntax for INSERT and CURRENT_TIMESTAMP. No need for dialect any more:INSERT INTO TestRowsVALUES ('Bob', 1, NULL), ('Rob', NULL, 'ID00000001'), ('Dave', 2, 'ID00000009');INSERT INTO TestActionsVALUES (1, NULL, 'Account Opened', CURRENT_TIMESTAMP), (1, NULL, 'Order Placed', CURRENT_TIMESTAMP), (NULL, 'ID00000001', 'Account Opened', CURRENT_TIMESTAMP), (2, 'ID00000009', 'Account Opened', CURRENT_TIMESTAMP);This same problem exists in the financial world. There are 8 to 10 systems for naming securities, which I cannot remember anymore. CUSIP (Committee on Uniform Security Identification Procedures) was used in the US and Canada, but not outside of North America. Currently the ISIN (International Securities Identification Number/ ISO-6166) has absorbed CUSIP. What we had was a conversion table with ONE required key. The table coverts ISIN to local codes. We do not carry all the codes in every table, like you are doing.CREATE TABLE Securities_Identification_Codes(isin CHAR(12) NOT NULL PRIMARY KEY CHECK (..), cusip CHAR(9) CHECK (..), valor INTEGER CHECK (..), sedol CHAR(7) CHECK (..), wkn CHAR(6) CHECK (..), etc); --CELKO--Books in Celko Series for Morgan-Kaufmann PublishingAnalytics and OLAP in SQLData and Databases: Concepts in Practice Data, Measurements and Standards in SQLSQL for SmartiesSQL Programming Style SQL Puzzles and Answers Thinking in SetsTrees and Hierarchies in SQL |
 |
|
Humate
Posting Yak Master
101 Posts |
Posted - 2010-11-17 : 06:32:25
|
Okay, perhaps my test example was a little too basic. It was the principle I was trying to illustrate, and see what could be done in terms of the keys/data structure to prevent them being carried in tables this way, or if the usual option was to overcome the real issue inside the queries.Each table does of course have a primary key and a naming convention, I would not expect any table to have a persons name as the primary key. CREATE TABLE #testrecords( [intrecordID] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY, [ClientName] NCHAR(50) NULL, [intSystem1ID] INT NULL, [strSystem2ID] NVARCHAR (15) NULL) CREATE TABLE #testactions( [intactionID] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY, [intSystem1ID] INT NULL, [strSystem2ID] NVARCHAR (15) NULL, [strAction] NCHAR(50) NOT NULL, [dtmActionDate] DATETIME NOT NULL) INSERT INTO #testrecordsSELECT 'Bob', 1, NULL UNION ALLSELECT 'Rob', NULL, 'ID00000001' UNION ALLSELECT 'Dave', 2, 'ID00000009'INSERT INTO #testactionsSELECT 1, NULL, 'Account Opened', GETDATE() UNION ALLSELECT 1, NULL, 'Order Placed', GETDATE() UNION ALLSELECT NULL, 'ID00000001', 'Account Opened', GETDATE() UNION ALLSELECT 2, 'ID00000009', 'Account Opened', GETDATE()SELECT * FROM #testrecordsSELECT * FROM #testactionsDROP TABLE #testrecordsDROP TABLE #testactions Could you explain further how the conversion table would be used and populated? I have already had thoughts along these lines, and I created a reference table for the ID's in a similar way, but had not included in my explanation - I wanted to get thoughts on best practice. |
 |
|
|
|
|
|
|