Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
SouthSideRob
Starting Member
22 Posts |
Posted - 2013-08-01 : 10:37:10
|
Besides my primary key which is just the first field named ID which increments, I also need to figure out what field(s) make a record distinct from all the other fields. Some of my table designs are very wide so its very hard to manually determine the fields that make a record unique (no duplicates). I need to figure this out to avoid inserting duplicate records. Is there a system stored procedure or function that examines a populated table to determine which field(s) make each record distinct? Any thoughts... |
|
James K
Master Smack Fu Yak Hacker
3873 Posts |
Posted - 2013-08-01 : 10:51:49
|
I don't know of anything that will automatically run through all possible combinations of columns and determine if it is unique. For a given set of columns, you can figure out if there are any duplicates using the following query. If the query returns any rows, there are duplicates for col1,col2,col3 combination.Usually it is not a good idea to base the uniqueness by examining existing data. A better approach would be to examine the business rules to determine that. You might want to consult your business clients or other stakeholders to make that determinationSELECT col1,col2,col3FROM YourTableGROUP BY col1,col2, col3HAVING COUNT(*) > 1; |
|
|
SouthSideRob
Starting Member
22 Posts |
Posted - 2013-08-01 : 11:03:12
|
Thanks for the quick response. I don't want to say my clients don't know the business rules but they rely on a state data store and they are not completely sure what makes a record unique as well. I will play with the having query you posted above. Thanks... |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2013-08-01 : 16:35:34
|
Is it something like this you want?-- User supplied parametersDECLARE @SchemaName SYSNAME = N'dbo', @TableName SYSNAME = N'Attendance';/* Solution by SwePeso*/SET NOCOUNT ON;-- Prepare local variablesDECLARE @Sql NVARCHAR(MAX) = N'SELECT @Rows = COUNT_BIG(*) FROM ' + QUOTENAME(@SchemaName) + N'.' + QUOTENAME(@TableName) + N';', @Items BIGINT, @CurrID INT = 1, @StopID INT;-- Get count of total rows in table or viewEXEC dbo.sp_executesql @Sql, N'@Rows BIGINT OUT', @Rows = @Items OUT;-- Create local helper tableCREATE TABLE #Columns ( SetID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED, ColumnCount INT NOT NULL, ColumnList NVARCHAR(MAX) NOT NULL, Items BIGINT NOT NULL );-- Get all permutations of columns in table or viewWITH cteSource(theColumn, thePath, theColumnCount)AS ( SELECT QUOTENAME(COLUMN_NAME) AS theColumn, CAST(QUOTENAME(COLUMN_NAME) AS NVARCHAR(MAX)) AS thePath, 1 AS theColumnCount FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA = @SchemaName AND TABLE_NAME = @TableName UNION ALL SELECT QUOTENAME(c.COLUMN_NAME) AS theColumn, s.thePath + CAST(', ' + QUOTENAME(c.COLUMN_NAME) AS NVARCHAR(MAX)) AS thePath, s.theColumnCount + 1 AS theColumnCount FROM INFORMATION_SCHEMA.COLUMNS AS c INNER JOIN cteSource AS s ON s.theColumn < QUOTENAME(c.COLUMN_NAME) WHERE c.TABLE_SCHEMA = @SchemaName AND c.TABLE_NAME = @TableName)INSERT #Columns ( ColumnCount, ColumnList, Items )SELECT theColumnCount AS ColumnCount, thePath AS ColumnList, 0 AS ItemsFROM cteSourceUNION ALLSELECT 0 AS ColumnCount, 'Totalt Row Count' AS ColumnList, @Items AS Items;-- Get the columnlist to start withSELECT @StopID = MAX(SetID)FROM #ColumnsWHERE Items = 0;-- Loop through all columnslistsWHILE @CurrID <= @StopID BEGIN -- Print loop information just in case each loop takes some time RAISERROR('Now working with set %d of %d.', 10, 1, @CurrID, @StopID) WITH NOWAIT; -- Build statement for current columnlist SELECT @Sql = N'SELECT @Rows = COUNT_BIG(*) FROM (SELECT DISTINCT ' + ColumnList + N' FROM ' + QUOTENAME(@SchemaName) + N'.' + QUOTENAME(@TableName) + N') AS d;' FROM #Columns WHERE SetID = @CurrID; -- Find out how many unique rows for current columnlist EXEC dbo.sp_executesql @Sql, N'@Rows BIGINT OUT', @Rows = @Items OUT; -- Store unique rows in local helper table UPDATE #Columns SET Items = @Items WHERE SetID = @CurrID; -- Loop to next columnlist SET @CurrID += 1; END-- Display final resultsetSELECT @SchemaName AS TABLE_SCHEMA, @TableName AS TABLE_NAME, ColumnList AS COLUMN_LIST, Items AS ITEMSFROM #ColumnsORDER BY Items DESC, ColumnCount, ColumnList;-- Clean upDROP TABLE #Columns; Microsoft SQL Server MVP, MCT, MCSE, MCSA, MCP, MCITP, MCTS, MCDBA |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2013-08-01 : 16:38:27
|
Example outputTABLE_SCHEMA TABLE_NAME COLUMN_LIST ITEMSdbo Attendance Totalt Row Count 25dbo Attendance [Category], [Date] 25dbo Attendance [Category], [Date], [DOP_ID] 25dbo Attendance [Category], [Date], [Hours] 25dbo Attendance [Category], [Date], [DOP_ID], [Hours] 25dbo Attendance [Date], [Hours] 19dbo Attendance [Date], [DOP_ID], [Hours] 19dbo Attendance [Date] 14dbo Attendance [Date], [DOP_ID] 14dbo Attendance [Category], [Hours] 9dbo Attendance [Category], [DOP_ID], [Hours] 9dbo Attendance [DOP_ID], [Hours] 6dbo Attendance [Category] 4dbo Attendance [Hours] 4dbo Attendance [Category], [DOP_ID] 4dbo Attendance [DOP_ID] 2 Microsoft SQL Server MVP, MCT, MCSE, MCSA, MCP, MCITP, MCTS, MCDBA |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2013-08-01 : 16:43:35
|
Example of loop messagesNow working with column list 1 of 15.Now working with column list 2 of 15.Now working with column list 3 of 15.Now working with column list 4 of 15.Now working with column list 5 of 15.Now working with column list 6 of 15.Now working with column list 7 of 15.Now working with column list 8 of 15.Now working with column list 9 of 15.Now working with column list 10 of 15.Now working with column list 11 of 15.Now working with column list 12 of 15.Now working with column list 13 of 15.Now working with column list 14 of 15.Now working with column list 15 of 15. Microsoft SQL Server MVP, MCT, MCSE, MCSA, MCP, MCITP, MCTS, MCDBA |
|
|
James K
Master Smack Fu Yak Hacker
3873 Posts |
Posted - 2013-08-01 : 17:14:09
|
quote: Originally posted by SwePeso Is it something like this you want?-- User supplied parametersDECLARE @SchemaName SYSNAME = N'dbo', @TableName SYSNAME = N'Attendance';/* Solution by SwePeso*/SET NOCOUNT ON;-- Prepare local variablesDECLARE @Sql NVARCHAR(MAX) = N'SELECT @Rows = COUNT_BIG(*) FROM ' + QUOTENAME(@SchemaName) + N'.' + QUOTENAME(@TableName) + N';', @Items BIGINT, @CurrID INT = 1, @StopID INT;-- Get count of total rows in table or viewEXEC dbo.sp_executesql @Sql, N'@Rows BIGINT OUT', @Rows = @Items OUT;-- Create local helper tableCREATE TABLE #Columns ( SetID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED, ColumnCount INT NOT NULL, ColumnList NVARCHAR(MAX) NOT NULL, Items BIGINT NOT NULL );-- Get all permutations of columns in table or viewWITH cteSource(theColumn, thePath, theColumnCount)AS ( SELECT QUOTENAME(COLUMN_NAME) AS theColumn, CAST(QUOTENAME(COLUMN_NAME) AS NVARCHAR(MAX)) AS thePath, 1 AS theColumnCount FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA = @SchemaName AND TABLE_NAME = @TableName UNION ALL SELECT QUOTENAME(c.COLUMN_NAME) AS theColumn, s.thePath + CAST(', ' + QUOTENAME(c.COLUMN_NAME) AS NVARCHAR(MAX)) AS thePath, s.theColumnCount + 1 AS theColumnCount FROM INFORMATION_SCHEMA.COLUMNS AS c INNER JOIN cteSource AS s ON s.theColumn < QUOTENAME(c.COLUMN_NAME) WHERE c.TABLE_SCHEMA = @SchemaName AND c.TABLE_NAME = @TableName)INSERT #Columns ( ColumnCount, ColumnList, Items )SELECT theColumnCount AS ColumnCount, thePath AS ColumnList, 0 AS ItemsFROM cteSourceUNION ALLSELECT 0 AS ColumnCount, 'Totalt Row Count' AS ColumnList, @Items AS Items;-- Get the columnlist to start withSELECT @StopID = MAX(SetID)FROM #ColumnsWHERE Items = 0;-- Loop through all columnslistsWHILE @CurrID <= @StopID BEGIN -- Print loop information just in case each loop takes some time RAISERROR('Now working with set %d of %d.', 10, 1, @CurrID, @StopID) WITH NOWAIT; -- Build statement for current columnlist SELECT @Sql = N'SELECT @Rows = COUNT_BIG(*) FROM (SELECT DISTINCT ' + ColumnList + N' FROM ' + QUOTENAME(@SchemaName) + N'.' + QUOTENAME(@TableName) + N') AS d;' FROM #Columns WHERE SetID = @CurrID; -- Find out how many unique rows for current columnlist EXEC dbo.sp_executesql @Sql, N'@Rows BIGINT OUT', @Rows = @Items OUT; -- Store unique rows in local helper table UPDATE #Columns SET Items = @Items WHERE SetID = @CurrID; -- Loop to next columnlist SET @CurrID += 1; END-- Display final resultsetSELECT @SchemaName AS TABLE_SCHEMA, @TableName AS TABLE_NAME, ColumnList AS COLUMN_LIST, Items AS ITEMSFROM #ColumnsORDER BY Items DESC, ColumnCount, ColumnList;-- Clean upDROP TABLE #Columns; Microsoft SQL Server MVP, MCT, MCSE, MCSA, MCP, MCITP, MCTS, MCDBA
For even moderately wide tables (which OP said he has), this can become unweildy. Number of combinations to be examined would be SIGMA[r=1 to n](nCr), which evaluates to (2^n)-1. For a table that has only 20 columns this would result in 1,048,575 combinations to be examined.SouthSideRob, is there some way you can eliminate a lot of your columns and use only a subset - may be 10 or so (1023 combinations) ? If you can, Swepeso's solution should work. Even so, you would be scanning the table 1023 times, which can be painful to watch if you have a large number of rows. |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2013-08-01 : 17:58:48
|
Something like this then?-- User supplied parametersDECLARE @SchemaName SYSNAME = N'dbo', @TableName SYSNAME = N'Attendance';/* Solution by SwePeso*/SET NOCOUNT ON;IF OBJECT_ID(@SchemaName + N'.' + @TableName) IS NULL BEGIN RAISERROR('There is no table.', 18, 1) RETURN END-- Prepare local variablesDECLARE @Sql NVARCHAR(MAX) = N'SELECT @Rows = COUNT_BIG(*) FROM ' + QUOTENAME(@SchemaName) + N'.' + QUOTENAME(@TableName) + N';', @Items BIGINT, @CurrID INT = 1, @StopID INT;-- Create local copy of columns to work withCREATE TABLE #Columns ( Name NVARCHAR(450) PRIMARY KEY CLUSTERED );-- Get valid columnsINSERT #Columns ( Name )SELECT QUOTENAME(COLUMN_NAME) AS NameFROM INFORMATION_SCHEMA.COLUMNSWHERE TABLE_SCHEMA = @SchemaName AND TABLE_NAME = @TableName --AND IS_NULLABLE = 'NO' -- Use this if you are looking for a primary key AND COALESCE(CHARACTER_MAXIMUM_LENGTH, 0) >= 0 -- Remove (MAX) data types AND DATA_TYPE NOT IN ('FLOAT', 'REAL', 'TEXT', 'IMAGE'); -- Remove inprecise float, real and blobsSET @Items = @@ROWCOUNT;-- If invalid column count, stop hereIF @Items = 0 BEGIN DROP TABLE #Columns; RAISERROR('Too few (0) valid columns to work with.', 16, 1); RETURN ENDELSE IF @Items > 20 BEGIN DROP TABLE #Columns; RAISERROR('Too many (%I64d) valid columns to work with.', 16, 1, @Items); RETURN END-- Get count of total rows in table or viewRAISERROR('Now getting total row count.', 10, 1) WITH NOWAIT;EXEC dbo.sp_executesql @Sql, N'@Rows BIGINT OUT', @Rows = @Items OUT;-- If too few rows, stop hereIF @Items = 0 BEGIN DROP TABLE #Columns; RAISERROR('No rows to work with.', 16, 1); RETURN ENDELSE IF @Items = 1 BEGIN DROP TABLE #Columns; RAISERROR('Only one row to work with.', 16, 1); RETURN END-- Create local helper tableCREATE TABLE #Permutations ( ID INT IDENTITY(0, 1) PRIMARY KEY CLUSTERED, ColumnCount INT NOT NULL, ColumnList NVARCHAR(MAX) NOT NULL, Items BIGINT NOT NULL );-- Insert total row countINSERT #Permutations ( ColumnCount, ColumnList, Items )SELECT 0 AS ColumnCount, 'Totalt Row Count' AS ColumnList, @Items AS Items;-- Print column list informationSELECT @Items = POWER(2, COUNT(*)) - 1FROM #Columns;RAISERROR('Now getting all %I64d columnlist permutations.', 10, 1, @Items) WITH NOWAIT;-- Get all permutations of columns in table or viewWITH cteSource(theColumn, thePath, theColumnCount)AS ( SELECT Name AS theColumn, CAST(Name AS NVARCHAR(MAX)) AS thePath, 1 AS theColumnCount FROM #Columns UNION ALL SELECT c.Name AS theColumn, s.thePath + CAST(', ' + c.Name AS NVARCHAR(MAX)) AS thePath, s.theColumnCount + 1 AS theColumnCount FROM #Columns AS c INNER JOIN cteSource AS s ON s.theColumn < c.Name)INSERT #Permutations ( ColumnCount, ColumnList, Items )SELECT theColumnCount AS ColumnCount, thePath AS ColumnList, 0 AS ItemsFROM cteSource;-- Get the columnlist to start withSELECT @StopID = MAX(ID)FROM #Permutations;-- Loop through all columnslistsWHILE @CurrID <= @StopID BEGIN -- Print loop information RAISERROR('Now working with columnlist %d of %d.', 10, 1, @CurrID, @StopID) WITH NOWAIT; -- Build statement for current columnlist SELECT @Sql = N'SELECT @Rows = COUNT_BIG(*) FROM (SELECT DISTINCT ' + ColumnList + N' FROM ' + QUOTENAME(@SchemaName) + N'.' + QUOTENAME(@TableName) + N') AS d;' FROM #Permutations WHERE ID = @CurrID; -- Find out how many unique rows for current columnlist EXEC dbo.sp_executesql @Sql, N'@Rows BIGINT OUT', @Rows = @Items OUT; -- Store unique rows in local helper table UPDATE #Permutations SET Items = @Items WHERE ID = @CurrID; -- Loop to next columnlist SET @CurrID += 1; END-- Display final resultsetSELECT @SchemaName AS TABLE_SCHEMA, @TableName AS TABLE_NAME, ColumnList AS COLUMN_LIST, Items AS ITEMSFROM #PermutationsORDER BY Items DESC, ColumnCount, ColumnList;-- Clean upDROP TABLE #Columns, #Permutations; Microsoft SQL Server MVP, MCT, MCSE, MCSA, MCP, MCITP, MCTS, MCDBA |
|
|
MuMu88
Aged Yak Warrior
549 Posts |
Posted - 2013-08-01 : 20:25:17
|
I agree with James K that inferring a unique key from exiting data seems like a dangerous bidness, since you don’t know what data will be appended in the future. Would it be possible to reach your end goal of not appending duplicate data by taking a slightly different approach, namely, prior to inserting any new row, attempt to select the exact row you are planning to insert (minus the id column). If your select returns nothing, you are good to go. |
|
|
|
|
|
|
|