Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 Creating a Unique Index

Author  Topic 

SouthSideRob
Starting Member

22 Posts

Posted - 2013-08-01 : 10:37:10
Besides my primary key which is just the first field named ID which increments, I also need to figure out what field(s) make a record distinct from all the other fields. Some of my table designs are very wide so its very hard to manually determine the fields that make a record unique (no duplicates). I need to figure this out to avoid inserting duplicate records. Is there a system stored procedure or function that examines a populated table to determine which field(s) make each record distinct? Any thoughts...

James K
Master Smack Fu Yak Hacker

3873 Posts

Posted - 2013-08-01 : 10:51:49
I don't know of anything that will automatically run through all possible combinations of columns and determine if it is unique. For a given set of columns, you can figure out if there are any duplicates using the following query. If the query returns any rows, there are duplicates for col1,col2,col3 combination.

Usually it is not a good idea to base the uniqueness by examining existing data. A better approach would be to examine the business rules to determine that. You might want to consult your business clients or other stakeholders to make that determination
SELECT col1,col2,col3
FROM YourTable
GROUP BY col1,col2, col3
HAVING COUNT(*) > 1;
Go to Top of Page

SouthSideRob
Starting Member

22 Posts

Posted - 2013-08-01 : 11:03:12
Thanks for the quick response. I don't want to say my clients don't know the business rules but they rely on a state data store and they are not completely sure what makes a record unique as well. I will play with the having query you posted above. Thanks...
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2013-08-01 : 16:35:34
Is it something like this you want?
-- User supplied parameters
DECLARE @SchemaName SYSNAME = N'dbo',
@TableName SYSNAME = N'Attendance';

/*
Solution by SwePeso
*/

SET NOCOUNT ON;

-- Prepare local variables
DECLARE @Sql NVARCHAR(MAX) = N'SELECT @Rows = COUNT_BIG(*) FROM ' + QUOTENAME(@SchemaName) + N'.' + QUOTENAME(@TableName) + N';',
@Items BIGINT,
@CurrID INT = 1,
@StopID INT;

-- Get count of total rows in table or view
EXEC dbo.sp_executesql @Sql,
N'@Rows BIGINT OUT',
@Rows = @Items OUT;

-- Create local helper table
CREATE TABLE #Columns
(
SetID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
ColumnCount INT NOT NULL,
ColumnList NVARCHAR(MAX) NOT NULL,
Items BIGINT NOT NULL
);

-- Get all permutations of columns in table or view
WITH cteSource(theColumn, thePath, theColumnCount)
AS (
SELECT QUOTENAME(COLUMN_NAME) AS theColumn,
CAST(QUOTENAME(COLUMN_NAME) AS NVARCHAR(MAX)) AS thePath,
1 AS theColumnCount
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = @SchemaName
AND TABLE_NAME = @TableName

UNION ALL

SELECT QUOTENAME(c.COLUMN_NAME) AS theColumn,
s.thePath + CAST(', ' + QUOTENAME(c.COLUMN_NAME) AS NVARCHAR(MAX)) AS thePath,
s.theColumnCount + 1 AS theColumnCount
FROM INFORMATION_SCHEMA.COLUMNS AS c
INNER JOIN cteSource AS s ON s.theColumn < QUOTENAME(c.COLUMN_NAME)
WHERE c.TABLE_SCHEMA = @SchemaName
AND c.TABLE_NAME = @TableName
)
INSERT #Columns
(
ColumnCount,
ColumnList,
Items
)
SELECT theColumnCount AS ColumnCount,
thePath AS ColumnList,
0 AS Items
FROM cteSource

UNION ALL

SELECT 0 AS ColumnCount,
'Totalt Row Count' AS ColumnList,
@Items AS Items;

-- Get the columnlist to start with
SELECT @StopID = MAX(SetID)
FROM #Columns
WHERE Items = 0;

-- Loop through all columnslists
WHILE @CurrID <= @StopID
BEGIN
-- Print loop information just in case each loop takes some time
RAISERROR('Now working with set %d of %d.', 10, 1, @CurrID, @StopID) WITH NOWAIT;

-- Build statement for current columnlist
SELECT @Sql = N'SELECT @Rows = COUNT_BIG(*) FROM (SELECT DISTINCT ' + ColumnList + N' FROM ' + QUOTENAME(@SchemaName) + N'.' + QUOTENAME(@TableName) + N') AS d;'
FROM #Columns
WHERE SetID = @CurrID;

-- Find out how many unique rows for current columnlist
EXEC dbo.sp_executesql @Sql,
N'@Rows BIGINT OUT',
@Rows = @Items OUT;

-- Store unique rows in local helper table
UPDATE #Columns
SET Items = @Items
WHERE SetID = @CurrID;

-- Loop to next columnlist
SET @CurrID += 1;
END

-- Display final resultset
SELECT @SchemaName AS TABLE_SCHEMA,
@TableName AS TABLE_NAME,
ColumnList AS COLUMN_LIST,
Items AS ITEMS
FROM #Columns
ORDER BY Items DESC,
ColumnCount,
ColumnList;

-- Clean up
DROP TABLE #Columns;



Microsoft SQL Server MVP, MCT, MCSE, MCSA, MCP, MCITP, MCTS, MCDBA
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2013-08-01 : 16:38:27
Example output
TABLE_SCHEMA	TABLE_NAME	COLUMN_LIST				ITEMS
dbo Attendance Totalt Row Count 25
dbo Attendance [Category], [Date] 25
dbo Attendance [Category], [Date], [DOP_ID] 25
dbo Attendance [Category], [Date], [Hours] 25
dbo Attendance [Category], [Date], [DOP_ID], [Hours] 25
dbo Attendance [Date], [Hours] 19
dbo Attendance [Date], [DOP_ID], [Hours] 19
dbo Attendance [Date] 14
dbo Attendance [Date], [DOP_ID] 14
dbo Attendance [Category], [Hours] 9
dbo Attendance [Category], [DOP_ID], [Hours] 9
dbo Attendance [DOP_ID], [Hours] 6
dbo Attendance [Category] 4
dbo Attendance [Hours] 4
dbo Attendance [Category], [DOP_ID] 4
dbo Attendance [DOP_ID] 2



Microsoft SQL Server MVP, MCT, MCSE, MCSA, MCP, MCITP, MCTS, MCDBA
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2013-08-01 : 16:43:35
Example of loop messages
Now working with column list 1 of 15.
Now working with column list 2 of 15.
Now working with column list 3 of 15.
Now working with column list 4 of 15.
Now working with column list 5 of 15.
Now working with column list 6 of 15.
Now working with column list 7 of 15.
Now working with column list 8 of 15.
Now working with column list 9 of 15.
Now working with column list 10 of 15.
Now working with column list 11 of 15.
Now working with column list 12 of 15.
Now working with column list 13 of 15.
Now working with column list 14 of 15.
Now working with column list 15 of 15.



Microsoft SQL Server MVP, MCT, MCSE, MCSA, MCP, MCITP, MCTS, MCDBA
Go to Top of Page

James K
Master Smack Fu Yak Hacker

3873 Posts

Posted - 2013-08-01 : 17:14:09
quote:
Originally posted by SwePeso

Is it something like this you want?
-- User supplied parameters
DECLARE @SchemaName SYSNAME = N'dbo',
@TableName SYSNAME = N'Attendance';

/*
Solution by SwePeso
*/

SET NOCOUNT ON;

-- Prepare local variables
DECLARE @Sql NVARCHAR(MAX) = N'SELECT @Rows = COUNT_BIG(*) FROM ' + QUOTENAME(@SchemaName) + N'.' + QUOTENAME(@TableName) + N';',
@Items BIGINT,
@CurrID INT = 1,
@StopID INT;

-- Get count of total rows in table or view
EXEC dbo.sp_executesql @Sql,
N'@Rows BIGINT OUT',
@Rows = @Items OUT;

-- Create local helper table
CREATE TABLE #Columns
(
SetID INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
ColumnCount INT NOT NULL,
ColumnList NVARCHAR(MAX) NOT NULL,
Items BIGINT NOT NULL
);

-- Get all permutations of columns in table or view
WITH cteSource(theColumn, thePath, theColumnCount)
AS (
SELECT QUOTENAME(COLUMN_NAME) AS theColumn,
CAST(QUOTENAME(COLUMN_NAME) AS NVARCHAR(MAX)) AS thePath,
1 AS theColumnCount
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = @SchemaName
AND TABLE_NAME = @TableName

UNION ALL

SELECT QUOTENAME(c.COLUMN_NAME) AS theColumn,
s.thePath + CAST(', ' + QUOTENAME(c.COLUMN_NAME) AS NVARCHAR(MAX)) AS thePath,
s.theColumnCount + 1 AS theColumnCount
FROM INFORMATION_SCHEMA.COLUMNS AS c
INNER JOIN cteSource AS s ON s.theColumn < QUOTENAME(c.COLUMN_NAME)
WHERE c.TABLE_SCHEMA = @SchemaName
AND c.TABLE_NAME = @TableName
)
INSERT #Columns
(
ColumnCount,
ColumnList,
Items
)
SELECT theColumnCount AS ColumnCount,
thePath AS ColumnList,
0 AS Items
FROM cteSource

UNION ALL

SELECT 0 AS ColumnCount,
'Totalt Row Count' AS ColumnList,
@Items AS Items;

-- Get the columnlist to start with
SELECT @StopID = MAX(SetID)
FROM #Columns
WHERE Items = 0;

-- Loop through all columnslists
WHILE @CurrID <= @StopID
BEGIN
-- Print loop information just in case each loop takes some time
RAISERROR('Now working with set %d of %d.', 10, 1, @CurrID, @StopID) WITH NOWAIT;

-- Build statement for current columnlist
SELECT @Sql = N'SELECT @Rows = COUNT_BIG(*) FROM (SELECT DISTINCT ' + ColumnList + N' FROM ' + QUOTENAME(@SchemaName) + N'.' + QUOTENAME(@TableName) + N') AS d;'
FROM #Columns
WHERE SetID = @CurrID;

-- Find out how many unique rows for current columnlist
EXEC dbo.sp_executesql @Sql,
N'@Rows BIGINT OUT',
@Rows = @Items OUT;

-- Store unique rows in local helper table
UPDATE #Columns
SET Items = @Items
WHERE SetID = @CurrID;

-- Loop to next columnlist
SET @CurrID += 1;
END

-- Display final resultset
SELECT @SchemaName AS TABLE_SCHEMA,
@TableName AS TABLE_NAME,
ColumnList AS COLUMN_LIST,
Items AS ITEMS
FROM #Columns
ORDER BY Items DESC,
ColumnCount,
ColumnList;

-- Clean up
DROP TABLE #Columns;



Microsoft SQL Server MVP, MCT, MCSE, MCSA, MCP, MCITP, MCTS, MCDBA

For even moderately wide tables (which OP said he has), this can become unweildy.

Number of combinations to be examined would be SIGMA[r=1 to n](nCr), which evaluates to (2^n)-1.

For a table that has only 20 columns this would result in 1,048,575 combinations to be examined.

SouthSideRob, is there some way you can eliminate a lot of your columns and use only a subset - may be 10 or so (1023 combinations) ? If you can, Swepeso's solution should work. Even so, you would be scanning the table 1023 times, which can be painful to watch if you have a large number of rows.
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2013-08-01 : 17:58:48
Something like this then?
-- User supplied parameters
DECLARE @SchemaName SYSNAME = N'dbo',
@TableName SYSNAME = N'Attendance';

/*
Solution by SwePeso
*/
SET NOCOUNT ON;

IF OBJECT_ID(@SchemaName + N'.' + @TableName) IS NULL
BEGIN
RAISERROR('There is no table.', 18, 1)
RETURN
END

-- Prepare local variables
DECLARE @Sql NVARCHAR(MAX) = N'SELECT @Rows = COUNT_BIG(*) FROM ' + QUOTENAME(@SchemaName) + N'.' + QUOTENAME(@TableName) + N';',
@Items BIGINT,
@CurrID INT = 1,
@StopID INT;

-- Create local copy of columns to work with
CREATE TABLE #Columns
(
Name NVARCHAR(450) PRIMARY KEY CLUSTERED
);

-- Get valid columns
INSERT #Columns
(
Name
)
SELECT QUOTENAME(COLUMN_NAME) AS Name
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = @SchemaName
AND TABLE_NAME = @TableName
--AND IS_NULLABLE = 'NO' -- Use this if you are looking for a primary key
AND COALESCE(CHARACTER_MAXIMUM_LENGTH, 0) >= 0 -- Remove (MAX) data types
AND DATA_TYPE NOT IN ('FLOAT', 'REAL', 'TEXT', 'IMAGE'); -- Remove inprecise float, real and blobs

SET @Items = @@ROWCOUNT;

-- If invalid column count, stop here
IF @Items = 0
BEGIN
DROP TABLE #Columns;
RAISERROR('Too few (0) valid columns to work with.', 16, 1);
RETURN
END
ELSE IF @Items > 20
BEGIN
DROP TABLE #Columns;
RAISERROR('Too many (%I64d) valid columns to work with.', 16, 1, @Items);
RETURN
END

-- Get count of total rows in table or view
RAISERROR('Now getting total row count.', 10, 1) WITH NOWAIT;

EXEC dbo.sp_executesql @Sql,
N'@Rows BIGINT OUT',
@Rows = @Items OUT;

-- If too few rows, stop here
IF @Items = 0
BEGIN
DROP TABLE #Columns;
RAISERROR('No rows to work with.', 16, 1);
RETURN
END
ELSE IF @Items = 1
BEGIN
DROP TABLE #Columns;
RAISERROR('Only one row to work with.', 16, 1);
RETURN
END

-- Create local helper table
CREATE TABLE #Permutations
(
ID INT IDENTITY(0, 1) PRIMARY KEY CLUSTERED,
ColumnCount INT NOT NULL,
ColumnList NVARCHAR(MAX) NOT NULL,
Items BIGINT NOT NULL
);

-- Insert total row count
INSERT #Permutations
(
ColumnCount,
ColumnList,
Items
)
SELECT 0 AS ColumnCount,
'Totalt Row Count' AS ColumnList,
@Items AS Items;

-- Print column list information
SELECT @Items = POWER(2, COUNT(*)) - 1
FROM #Columns;

RAISERROR('Now getting all %I64d columnlist permutations.', 10, 1, @Items) WITH NOWAIT;

-- Get all permutations of columns in table or view
WITH cteSource(theColumn, thePath, theColumnCount)
AS (
SELECT Name AS theColumn,
CAST(Name AS NVARCHAR(MAX)) AS thePath,
1 AS theColumnCount
FROM #Columns

UNION ALL

SELECT c.Name AS theColumn,
s.thePath + CAST(', ' + c.Name AS NVARCHAR(MAX)) AS thePath,
s.theColumnCount + 1 AS theColumnCount
FROM #Columns AS c
INNER JOIN cteSource AS s ON s.theColumn < c.Name
)
INSERT #Permutations
(
ColumnCount,
ColumnList,
Items
)
SELECT theColumnCount AS ColumnCount,
thePath AS ColumnList,
0 AS Items
FROM cteSource;

-- Get the columnlist to start with
SELECT @StopID = MAX(ID)
FROM #Permutations;

-- Loop through all columnslists
WHILE @CurrID <= @StopID
BEGIN
-- Print loop information
RAISERROR('Now working with columnlist %d of %d.', 10, 1, @CurrID, @StopID) WITH NOWAIT;

-- Build statement for current columnlist
SELECT @Sql = N'SELECT @Rows = COUNT_BIG(*) FROM (SELECT DISTINCT ' + ColumnList + N' FROM ' + QUOTENAME(@SchemaName) + N'.' + QUOTENAME(@TableName) + N') AS d;'
FROM #Permutations
WHERE ID = @CurrID;

-- Find out how many unique rows for current columnlist
EXEC dbo.sp_executesql @Sql,
N'@Rows BIGINT OUT',
@Rows = @Items OUT;

-- Store unique rows in local helper table
UPDATE #Permutations
SET Items = @Items
WHERE ID = @CurrID;

-- Loop to next columnlist
SET @CurrID += 1;
END

-- Display final resultset
SELECT @SchemaName AS TABLE_SCHEMA,
@TableName AS TABLE_NAME,
ColumnList AS COLUMN_LIST,
Items AS ITEMS
FROM #Permutations
ORDER BY Items DESC,
ColumnCount,
ColumnList;

-- Clean up
DROP TABLE #Columns,
#Permutations;



Microsoft SQL Server MVP, MCT, MCSE, MCSA, MCP, MCITP, MCTS, MCDBA
Go to Top of Page

MuMu88
Aged Yak Warrior

549 Posts

Posted - 2013-08-01 : 20:25:17
I agree with James K that inferring a unique key from exiting data seems like a dangerous bidness, since you don’t know what data will be appended in the future. Would it be possible to reach your end goal of not appending duplicate data by taking a slightly different approach, namely, prior to inserting any new row, attempt to select the exact row you are planning to insert (minus the id column). If your select returns nothing, you are good to go.
Go to Top of Page
   

- Advertisement -