Anticipating Primary Key ViolationsBy Guest Authors on 23 November 2003 | Tags: INSERT This article comes to us from Karen Gayda. She introduces a script she's written to catch primary key violations before they are inserted into a table. IntroductionYears ago when I was just breaking into the programming field I worked with Paradox databases. While Paradox pales in comparison to a database engine like SQL Server, it did have one interesting feature that has not yet been duplicated by Microsoft. When performing an insert into a table, any key violation records would be shunted to a violation table while the good records would be inserted into the intended table. This feature allowed for graceful error handling and for tracking of bad data. Recently I encountered a situation at one of my clients that required that I gracefully handle duplicate data being sent by customers. The goal was to allow the good data to be processed while capturing the bad data so that the customers could be notified of their data issues thus enabling them to correct transmission problems. Having a Paradox-style key violation table would be ideal to handle this problem. Since SQL Server does not include this functionality I decided I needed to write a generic routine to somewhat mimic Paradox. Stored Procedure ScriptThis script creates a stored procedure that evaluates whether a select statement will cause Primary Key violations if data is inserted into a given target table. Key violation rows are saved to the designated violation table in the target database. (You can download the source code for this script and the required functions here.) Calling this stored procedure prior to actually executing a select statement will cause the key violation keys to be saved to a table. Doing this allows for excluding those rows from being inserted if a where clause is added that checks for the existence of the key violation records. SET QUOTED_IDENTIFIER OFF GO SET ANSI_NULLS ON GO IF EXISTS(SELECT * FROM dbo.sysobjects WHERE id = object_id(N'[dbo].[sp_GetKeyViolations]') AND OBJECTPROPERTY(id, N'IsProcedure') = 1 ) DROP PROCEDURE [dbo].[sp_GetKeyViolations] GO CREATE PROCEDURE dbo.sp_GetKeyViolations @vcSelectStatement varchar (8000), @sysTargetTable sysname, @sysViolationTable sysname, @sysTargetDatabase sysname, @bitEmptyViolationTable bit = 0 AS SET NOCOUNT ON DECLARE @vcSQL varchar(8000), @vcColumnList varchar(300), @sysColName sysname --Get the primary key columns in the target table SET @vcSQL = 'DECLARE crCols CURSOR STATIC FOR SELECT DISTINCT name FROM ' + @sysTargetDatabase + '.dbo.PrimaryKeyColumns(''' + @sysTargetTable + ''')' EXEC(@vcSQL) OPEN crCols --For Each Column in PrimaryKey, concatenate to column list SET @vcColumnList = '' FETCH NEXT FROM crCols INTO @sysColName WHILE(@@FETCH_STATUS = 0) BEGIN SET @vcColumnList = @vcColumnList + @sysColName + ',' FETCH NEXT FROM crCols INTO @sysColName END --Get next column SET @vcColumnList = SUBSTRING(@vcColumnList,1,LEN(@vcColumnList) -1) --remove trailing comma CLOSE crCols --Get selected rows and save to a temporary table IF EXISTS (select * from sysobjects where id = object_id('##tempSourceRecords') and OBJECTPROPERTY(id, N'IsTable') = 1) DROP TABLE ##tempSourceRecords SET @vcSQL = (SELECT Element FROM master.dbo.Split(@vcSelectStatement, 'FROM') WHERE ElementID = 1) + ' INTO ##tempSourceRecords FROM ' + (SELECT Element FROM master.dbo.Split(@vcSelectStatement, 'FROM') WHERE ElementID = 2) PRINT(@vcSQL) EXEC (@vcSQL) --If key violation table does not already exist, create it DECLARE @nvcSQL nvarchar(4000), @nvcParmDef nvarchar(500), @bitExists bit SET @nvcSQL = 'IF EXISTS (select * from '+ @sysTargetDatabase+ '.dbo.sysobjects where name = ''' + @sysViolationTable + ''' AND xtype=''U'') ' + 'SET @bitExistsOUT = 1 ' + 'ELSE ' + 'SET @bitExistsOUT = 0' SET @nvcParmDef = N'@bitExistsOUT bit OUTPUT' EXEC sp_executesql @nvcSQL, @nvcParmDef, @bitExistsOUT=@bitExists OUTPUT IF @bitExists = 0 BEGIN --Create new table based upon supplied select list if needed SET @vcSQL = 'SELECT TOP 0 ' + @vcColumnList + + ' INTO ' + @sysTargetDatabase + '.dbo.'+ @sysViolationTable + ' FROM ##tempSourceRecords ' EXEC (@vcSQL) END --Empty violation table if caller indicates it should be cleared IF @bitEmptyViolationTable = 1 BEGIN SET @vcSQL = 'DELETE FROM ' + @sysTargetDatabase + '.dbo.' + @sysViolationTable PRINT @vcSQL EXEC (@vcSQL) END --Insert duplicate rows from target into key violation table SET @vcSQL = 'INSERT INTO ' + @sysTargetDatabase + '.dbo.' + @sysViolationTable + ' SELECT a.' + REPLACE(@vcColumnList, ',', ',a.') + ' FROM ##tempSourceRecords a' + ' INNER JOIN ' + @sysTargetDatabase + '.dbo.' + @sysTargetTable + ' b ' + ' ON ' OPEN crCols FETCH NEXT FROM crCols INTO @sysColName --get inner join columns to add to SQL insert statement WHILE(@@FETCH_STATUS = 0) BEGIN SET @vcSQL = @vcSQL + 'a.' + @sysColName + '=b.' + @sysColName + ' AND ' FETCH NEXT FROM crCols INTO @sysColName END --Get next column SET @vcSQL = SUBSTRING(@vcSQL,1,LEN(@vcSQL) -4) --remove trailing AND --Add rows that are duplicate in the source select that may cause violation SET @vcSQL = @vcSQL + ' UNION SELECT ' + @vcColumnList + ' FROM ##TempSourceRecords ' + 'GROUP BY ' + @vcColumnList + ' HAVING COUNT(*) > 1' CLOSE crCOLS DEALLOCATE crCols PRINT @vcSQL EXEC (@vcSQL) DROP TABLE ##tempSourceRecords RETURN (0) GO Listing 1 – sp_GetKeyViolations Code Parameters
UsageThis stored procedure should be called prior to actually executing a SELECT statement. For example, run the following statement from the master database (once the procedure and supporting functions have been created): exec sp_GetKeyViolations 'SELECT TOP 10 CustomerID as CustomerID FROM Northwind.dbo.Customers WHERE ContactTitle LIKE ''Sales%''', 'Customers', --target table 'DupCustomers', -- table to store violation rows 'NORTHWIND', -- target database 0 -- do not clear violation table prior to insert Now the duplicate records can be excluded by adding a check to the key violation table as follows: SELECT TOP 10 CustomerID as CustomerID FROM Northwind.dbo.Customers C WHERE ContactTitle LIKE 'Sales%' AND NOT EXISTS( SELECT 1 FROM Northwind.dbo.DupCustomers D WHERE C.CustomerID = D.CustomerID) DependenciesSp_GetKeyViolations calls 2 user-defined functions, dbo.Split and dbo.PrimaryKeyColumns. dbo.Split should be installed in the master database. dbo.PrimaryKeyColumns should be installed in any database that will be used as a target database. If this sp will be used extensively then I recommend adding dbo.PrimaryKeyColumns to the model database to reduce administration. The code for these functions is included in the source code I created sp_GetKeyViolation in the master database. It is written so that it can be called from any DB for which a user has SP execute permissions. ConclusionSince duplicate data is a common problem, this procedure addresses the need to identify bad data without hampering the insert process in a very generic fashion.
|
- Advertisement - |