Comparing Tables

By Bill Graziano on 7 January 2002 | Tags: Administration


We're in the process of rolling out a new version of a SQL Server-based software package at work. I already have numerous scripts that load tables and I was trying to find out which ones would break. I needed an easy way to compare tables. There are some packages out there that do this but budgets are kind of tight where I work right now. And since I thought it would make a good article I decided to write my own.

The INFORMATION_SCHEMA Views

I started with the Information_Schema views. We answered a question that briefly described how to use these views. Garth also wrote an article that used them to create ADO parameters.

For this article we're mainly concerned with the COLUMNS table. Running the following SELECT statement gives a rough idea of how information is stored in this table.

Select 	TABLE_CATALOG = Left(TABLE_CATALOG, 10), 
	TABLE_SCHEMA = Left(TABLE_SCHEMA, 10), 
	TABLE_NAME = Left(TABLE_NAME, 10),
	COLUMN_NAME = Left(COLUMN_NAME, 20)
From 	pubs.Information_Schema.Columns
Where	Table_Name = 'authors'

-- Results

TABLE_CATALOG TABLE_SCHEMA TABLE_NAME COLUMN_NAME          
------------- ------------ ---------- -------------------- 
pubs          dbo          authors    au_id
pubs          dbo          authors    au_lname
pubs          dbo          authors    au_fname
pubs          dbo          authors    phone
pubs          dbo          authors    address
pubs          dbo          authors    city
pubs          dbo          authors    state
pubs          dbo          authors    zip
pubs          dbo          authors    contract

In the actual tables these fields are each 128 characters long and it's a pain to scroll through the results. It's fairly easy to see which columns are the database name, object owner, table name and column name. We're going to work with the following subset of data:

Select 	COLUMN_NAME = Left(COLUMN_NAME, 20), 
	DATA_TYPE = Left(DATA_TYPE, 10), 
	CHARACTER_MAXIMUM_LENGTH as CHAR_MAX_LEN,
	NUMERIC_PRECISION AS NUM_PRECISION, 
	NUMERIC_SCALE AS NUM_SCALE
From 	pubs.Information_Schema.Columns
Where	Table_Name = 'authors'

--Results

COLUMN_NAME          DATA_TYPE  CHAR_MAX_LEN NUM_PRECISION NUM_SCALE   
-------------------- ---------- ------------ ------------- ----------- 
au_id                varchar    11           NULL          NULL
au_lname             varchar    40           NULL          NULL
au_fname             varchar    20           NULL          NULL
phone                char       12           NULL          NULL
address              varchar    40           NULL          NULL
city                 varchar    20           NULL          NULL
state                char       2            NULL          NULL
zip                  char       5            NULL          NULL
contract             bit        NULL         1             0

These columns define the data type of a column. I'd encourage you to run a SELECT * on the table and see all the columns listed.

To test this script I created copy of pubs and called it pubs_mod. Then I made a few small changes to the authors table so I would have a table that had changed. If you look at the SELECT statements above you'll notice that these INFORMATION_SCHEMA tables are inside each database. If I want to compare objects in different databases I'll need to do cross-database joins. A simple query to find the new columns in my authors table looks like this:

Select	COLUMN_NAME
From	pubs_mod.INFORMATION_SCHEMA.Columns
Where	TABLE_NAME = 'authors'
AND	COLUMN_NAME NOT IN (
		SELECT 	COLUMN_NAME
		FROM 	pubs.INFORMATION_SCHEMA.Columns
		Where	TABLE_NAME = 'authors' )

Since I can't put a fully qualified table name in a variable, I can't pass in database name as a parameter to a stored procedure -- unless I write dynamic SQL. Unfortunately lazy DBA's rarely write dynamic SQL and I'm definitely a lazy DBA! So I created this view in master:

Create view vw_schema_columns
as
Select	*
from	pubs.information_schema.columns
UNION ALL
Select	*
from	pubs_mod.information_schema.columns

Since the TABLE_CATALOG column identifies the source database this is a handy way to have all the information I need in one place. As an aside, you should be able to modify the script to use a linked server and compare tables on different servers. I could have created it anywhere but I knew master wasn't going away and that's also where I'm going to put my stored procedure when I'm done with it.

Parsing Names

I want to pass this procedure the names of two tables and have it compare them. They can be in different databases and have different owners. You'd think that I'd have to pass in three parameters per table -- database, owner and table name. There's a handy function is SQL Server called PARSENAME that will really help us out here. PARSENAME takes a name in the form Server.Database.Owner.Table (or Database..Table, etc.) and breaks out the various components. It doesn't check if they actually exist, it just returns the parts. Now I can call my procedure like this:

sp_compare_tables 'authors', 'pubs_mod..authors'

Either table can be as qualified as I need it to be. In my code the default database is the current database and the default owner is dbo. You can easily change the script to change those defaults.

Datatypes

If you look at the SELECT statement above that returned the datatypes you can see what datatypes the various columns are but it certainly doesn't look very nice. I wrote a user defined function to clean them up a little bit. The main part is this SELECT statement:

Select @v1 = CASE RTrim(@DataType)
	WHEN 'char' THEN 'char(' + convert(varchar, @CharLength) + ')'
	WHEN 'nchar' THEN 'char(' + convert(varchar, @CharLength) + ')'
	WHEN 'varchar' THEN 'char(' + convert(varchar, @CharLength) + ')'
	WHEN 'nvarchar' THEN 'char(' + convert(varchar, @CharLength) + ')'
	WHEN 'decimal' THEN 'decimal(' + convert(varchar, @Precision) + ',' + 
		convert(varchar, @Scale) + ')'
	WHEN 'numeric' THEN 'numeric(' + convert(varchar, @Precision) + ',' + 
		convert(varchar, @Scale) + ')'	
	ELSE RTRIM(@DataType)
   END

The only datatypes I really need to handle are char, varchar, decimal and numeric. You can read more about CASE in The Case for CASE. Updating my SELECT statement above with this yields:

Select 	COLUMN_NAME = Left(COLUMN_NAME, 20), 
CONVERT(VARCHAR(64), master.dbo.fn_display_datatype(DATA_TYPE, 
		CHARACTER_MAXIMUM_LENGTH, NUMERIC_PRECISION, 
		NUMERIC_SCALE) ) AS DATATYPE_Description
From 	Information_Schema.Columns
Where	Table_Name = 'authors'

-- Results

COLUMN_NAME          DATATYPE_Description           
-------------------- ------------------------------ 
au_id                char(11)
au_lname             char(40)
au_fname             char(20)
phone                char(12)
address              char(40)
city                 char(20)
state                char(2)
zip                  char(5)
contract             bit

I think that looks much cleaner.

Putting it all Together

The final script is a little long to put in the article. You can download it here. You run it like this:

use pubs
go
sp_compare_tables 'authors', 'pubs_mod..authors'
go

-- Results

Change  COLUMN_NAME                      DATATYPE_Description             
------- -------------------------------- ---------------------------------
New     marketingcode                    char(3)
New     c_int                            int
New     c_smalldatetime                  smalldatetime
New     c_float                          float
New     c_nvarchar                       char(32)
New     c_numeric                        numeric(7,2)
New     c_decimal                        decimal(11,9)
New     c_bit                            bit
Removed phone                            char(12)
Changed zip                              char(5) -> char(9)

There are other attributes of columns available in INFORMATION_SCHEMA.Columns that I didn't use. They include COLUMN_DEFAULT, IS_NULLABLE, CHARACTER_SET_NAME, COLLATION_NAME and quite a few others. You should be able to modify this script to handle those if you need them. There is also quite a bit of additional information available in other INFORMATION_SCHEMA tables you can use.

Since I put this in master and prefixed it with "sp_" I can call it from any database. I just need to make sure that all my databases are in my view.


Related Articles

Advanced SQL Server 2008 Extended Events with Examples (25 May 2009)

Introduction to SQL Server 2008 Extended Events (19 May 2009)

Monitoring SQL Server Agent with Powershell (24 March 2009)

SQL Server Version (29 January 2009)

Scheduling Jobs in SQL Server Express - Part 2 (1 December 2008)

Alerts for when Login Failures Strike (14 July 2008)

Using xp_ReadErrorLog in SQL Server 2005 (12 May 2008)

Moving the tempdb database (5 November 2007)

Other Recent Forum Posts

I have a sql query which returns a sum of total amount by user and I am trying to add another column which uses the total and multiply it by a number (22h)

Troubleshooting Deadlocks in SQL Server (10d)

Last Login date and time (11d)

Negative effects of High VLF counts (11d)

Need to return a value that indicates that a record has been added, but not when a record is modified (12d)

Indexex on low cardinality fields (12d)

Error in stored procedure (13d)

Spam post flagging (13d)

- Advertisement -