Create conditional query or make a unique key? - SQL Server Forums

Home | Weblogs | Forums

Site Sponsored By: SQLDSC - SQL Server Desired State Configuration

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2005 Forums

Transact-SQL (2005)

Create conditional query or make a unique key?

Author

Topic

Humate
Posting Yak Master

101 Posts

Posted - 2010-11-16 : 11:55:34

Hi All,

I have a problem relating to ideal structure for querying my data. I currently have tables that can link on two different keys. Depending on what system has been used for a particular client, they could have an ID, ID2, or both. This data is consolidated in the same table.

I will be running queries on millions of records, so would like to establish best practice for dealing with this type of situation. I want to prevent long running queries, but not sure if something like a concatenated key is the answer i.e. a new column for ID (int)+ ID2(varchar) = UniqueKey. I am aware this could be a huge overhead as an index though.

Or perhaps I just use some creative queries to deal with each join and UNION the results? Example below shows how records can relate to actions with 3 different combinations of ID, ID2:

CREATE TABLE #testrecords

(
	[Name] NCHAR(50) NULL,
	[ID] INT NULL,
	[ID2] NVARCHAR (15) NULL
) 

CREATE TABLE #testactions

(
	[ID] INT NULL,
	[ID2] NVARCHAR (15) NULL,
	[Action] NCHAR(50) NOT NULL,
	[ActionDate] DATETIME NOT NULL
) 

INSERT INTO #testrecords

SELECT 'Bob', 1, NULL UNION ALL
SELECT 'Rob', NULL, 'ID00000001' UNION ALL
SELECT 'Dave', 2, 'ID00000009'

INSERT INTO #testactions

SELECT 1, NULL, 'Account Opened', GETDATE() UNION ALL
SELECT 1, NULL, 'Order Placed', GETDATE() UNION ALL
SELECT NULL, 'ID00000001', 'Account Opened', GETDATE() UNION ALL
SELECT 2, 'ID00000009', 'Account Opened', GETDATE()

SELECT * FROM #testrecords
SELECT * FROM #testactions

DROP TABLE #testrecords
DROP TABLE #testactions

Lamprey
Master Smack Fu Yak Hacker

4614 Posts

Posted - 2010-11-16 : 12:08:10

Does something like this work?

SELECT *
FROM 
    #testrecords AS R
INNER JOIN
    #testactions AS A
    ON (R.ID = A.ID OR (R.ID IS NULL AND A.ID IS NULL))
    AND (R.ID2 = A.ID2 OR (R.ID2 IS NULL AND A.ID2 IS NULL))

Another option is to add another column to indicate which system they come from and then join accordingly.

Humate
Posting Yak Master

101 Posts

Posted - 2010-11-16 : 12:22:47

yes they do work, however I think the OR queries seem to be very resource hungry and time consuming. I have tried a CASE statement on the system too before now, which seemed to perform better, and an inner join on both keys with a UNION the fastest.

I probably need to time the various ways this could be written to find the exact query times, but was wondering if I should really be doing something with the keys/columns/table structure instead of querying in this way.

Lamprey
Master Smack Fu Yak Hacker

4614 Posts

Posted - 2010-11-16 : 12:30:05

Not fully knowing your data, I'd guess that you should create a composite key based on the Source System and the Source System Key. Then it's just a simple join clause based on two columns.

jcelko
Esteemed SQL Purist

547 Posts

Posted - 2010-11-16 : 14:54:44

I hope that you actually know that tables need keys, need DRI and how to name data elements. That magical generic "id" is totally wrong, and that NULLs are not spread over the schema like mustard, in spite of what you posted. We do not link in RDBMS; we REFERENCE and JOIN. Big difference! No more explicit pointer chains.

But this mess is so ugly, I have to fix it. The weird generic id's are a real design problem; only test_name can be the key you left off:

CREATE TABLE TestRows
(test_name CHAR(50) NOT NULL PRIMARY KEY,-- only choice and too long
id INTEGER,
id2 VARCHAR (15),
CHECK (CAST(id AS CHAR(20) + id2 IS NOT NULL);

Can I assume the next table also suffers the same design problem with id and id2? That also forces us to one possible key, the something_action. But that does not work.

CREATE TABLE TestActions
(id INTEGER NOT NULL,
id2 VARCHAR (15) NOT NULL,
something_action NCHAR(50) NOT NULL,
something_action_date DATE DEFAULT CURRENT_TIMESTAMP NOT NULL);

SQL Server now has real ANSI/ISO syntax for INSERT and CURRENT_TIMESTAMP. No need for dialect any more:

INSERT INTO TestRows
VALUES ('Bob', 1, NULL),
('Rob', NULL, 'ID00000001'),
('Dave', 2, 'ID00000009');

INSERT INTO TestActions
VALUES (1, NULL, 'Account Opened', CURRENT_TIMESTAMP),
(1, NULL, 'Order Placed', CURRENT_TIMESTAMP),
(NULL, 'ID00000001', 'Account Opened', CURRENT_TIMESTAMP),
(2, 'ID00000009', 'Account Opened', CURRENT_TIMESTAMP);

This same problem exists in the financial world. There are 8 to 10 systems for naming securities, which I cannot remember anymore. CUSIP (Committee on Uniform Security Identification Procedures) was used in the US and Canada, but not outside of North America. Currently the ISIN (International Securities Identification Number/ ISO-6166) has absorbed CUSIP.

What we had was a conversion table with ONE required key. The table coverts ISIN to local codes. We do not carry all the codes in every table, like you are doing.

CREATE TABLE Securities_Identification_Codes
(isin CHAR(12) NOT NULL PRIMARY KEY
CHECK (..),
cusip CHAR(9)
CHECK (..),
valor INTEGER
CHECK (..),
sedol CHAR(7)
CHECK (..),
wkn CHAR(6)
CHECK (..),
etc);

--CELKO--
Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL

Humate
Posting Yak Master

101 Posts

Posted - 2010-11-17 : 06:32:25

Okay, perhaps my test example was a little too basic. It was the principle I was trying to illustrate, and see what could be done in terms of the keys/data structure to prevent them being carried in tables this way, or if the usual option was to overcome the real issue inside the queries.

Each table does of course have a primary key and a naming convention, I would not expect any table to have a persons name as the primary key.


CREATE TABLE #testrecords

(
	[intrecordID] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY,
	[ClientName] NCHAR(50) NULL,
	[intSystem1ID] INT NULL,
	[strSystem2ID] NVARCHAR (15) NULL
) 

CREATE TABLE #testactions

(
	[intactionID] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY,
	[intSystem1ID] INT NULL,
	[strSystem2ID] NVARCHAR (15) NULL,
	[strAction] NCHAR(50) NOT NULL,
	[dtmActionDate] DATETIME NOT NULL
) 

INSERT INTO #testrecords

SELECT 'Bob', 1, NULL UNION ALL
SELECT 'Rob', NULL, 'ID00000001' UNION ALL
SELECT 'Dave', 2, 'ID00000009'

INSERT INTO #testactions

SELECT 1, NULL, 'Account Opened', GETDATE() UNION ALL
SELECT 1, NULL, 'Order Placed', GETDATE() UNION ALL
SELECT NULL, 'ID00000001', 'Account Opened', GETDATE() UNION ALL
SELECT 2, 'ID00000009', 'Account Opened', GETDATE()

SELECT * FROM #testrecords
SELECT * FROM #testactions

DROP TABLE #testrecords
DROP TABLE #testactions

Could you explain further how the conversion table would be used and populated? I have already had thoughts along these lines, and I created a reference table for the ID's in a similar way, but had not included in my explanation - I wanted to get thoughts on best practice.

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources