Using DTS to Automate a Data Import ProcessBy Garth Wells on 11 August 2002 | Tags: DTS In this article I show you how to create a DTS Package that can be used to automate a file download/import process. It's a little longer than the other articles I have written for SQLTeam and will require some work on your part if you want to implement it, but I think you will find it interesting.
I recently re-wrote a database application that gets daily data feeds via text
files that are uploaded to an ftp site. My client's customers generate the text
files from their mainframe systems and upload them to the ftp site for access.
The files are:
Automating the Process with DTSI do not like data loading processes that are dependent on human intervention, so during the re-write I wanted to replace this process with one that could be scheduled and run automatically. I remembered that DTS (Data Transformation Services) has a task that supports ftp, so I decided to investigate this approach. It could not have been easier to implement. DTS: A Brief IntroductionIf you have never worked with DTS, you have missed using one of the most useful tools that come with SQL Server. DTS makes it so easy to get data in/out of SQL Server, that it almost makes up for all the years developers had to put up with the limited functionality of bcp (bulk copy program). If you have never used bcp, consider yourself lucky. The main purpose of DTS is to allow you to define connections to various data sources and destinations and pump the data from one to the other. For example, you can define a text file as a data source, an instance of SQL Server as the destination, and then pump the data from one to the other. DTS supports numerous data sources, all of which can be seen by opening a new package and clicking the Connection option. To open a new package, expand the Server tree of the target SQL Server and click the Data Transformation Services folder. You can now right-click on Local Packages and select New Package. At this point you define the connections and add any other tasks that need to be completed during the transfer process. DTS allows you to include several other tasks that can be executed before and after the data transfer process. For example, the Process task will execute an .exe or .bat file, and the ActiveX Script Task allows you to execute either JScript or VBScript. And of course there is the File Transfer Protocol Task, which is the main focus of this article. The other tasks supported can be seen by clicking Task on the menu. Creating the PiecesI needed a DTS package that would do the following:
Before we can create a package that does this, we need some source data, two destination tables, and a stored procedure. We will use a fixed-width text file that conforms to the following layout for the data source. You can use your favorite text editor to create the file. Column Width Name 40 Phone 10 My sample data looks like this: Kelly Smith 8885551212 Paul Jones 8885551213 Mike Andrews 8885551214 Creating the tables is the next step. You may be wondering why I want to create two destination tables. The first one is a temporary destination that holds the data in the exact same format as the text file, and the second one is the data's final destination--how it appears after being scrubbed. The tables can be created with the following DDL. CREATE TABLE ContactsImport ( CI_Name char(40) NOT NULL, CI_Phone char(10) NOT NULL, CI_Processed char(1) DEFAULT 'N' ) go CREATE TABLE Contacts ( CN_ID int IDENTITY NOT NULL PRIMARY KEY, CN_FName varchar(20) NOT NULL, CN_LName varchar(20) NOT NULL, CN_Phone varchar(10) NOT NULL ) I learned to use the two table approach during my first experience with DTS in SQL Server 7. You can actually scrub the data while it is in transit using VBScript or JScript, but that slowed down the import significantly when large files were processed. The faster approach (in my experience) is to import the data as-is, and process it with a T-SQL inside the database. Of course, when large files are processed you do need to make sure that you have the hard drive space to accommodate the extra copy of the data. The stored procedure used to rename/move the source file and scrub the data is shown here. CREATE PROCEDURE pr_ContactsImport AS -- Part I: Move/rename source file DECLARE @fn char(6), @cmd varchar(100) SET @fn=REPLACE(convert(char(8),getdate(),1),'/','') SET @cmd ='Move c:\articles\contacts.txt c:\articles\archive\'+@fn+'.txt' EXEC master..xp_cmdshell @cmd -- End of Part I: Move/rename source file -- Part II: Scrub the data BEGIN TRANSACTION INSERT Contacts (CN_FName,CN_LName,CN_Phone) SELECT SUBSTRING(CI_Name,1,CHARINDEX(CHAR(32),CI_Name)-1), SUBSTRING(CI_Name,CHARINDEX(CHAR(32),CI_Name)+1,DATALENGTH(CI_Name)), CI_Phone FROM ContactsImport WHERE CI_Processed='N' IF @@Error <> 0 GOTO ErrorHandler UPDATE ContactsImport SET CI_Processed='Y' IF @@Error <> 0 GOTO ErrorHandler COMMIT TRANSACTION ErrorHandler: IF @@TRANCOUNT > 0 ROLLBACK TRANSACTION RETURN -- End of Part II: Scrub the data go To facilitate the explanation I have separated the code into two Parts. The code in Part I moves/renames the file using xp_cmdshell. xp_cmdshell is an extended stored procedure that is used to execute an O/S command. I use the CONVERT and REPLACE functions to create the new file name that conforms to a mmddyy.txt convention and then execute the Move command. The source directory is C:\Articles and the archiving directory is: C:\Articles\Archive. NOTE: A user must be granted Execute permission on xp_cmdshell unless they are a member of the Sysadmin fixed server role. In Part II I scrub the data. The contact name is stored in one column in the data source (text file), but I want to store it in two in the Contacts table. The SUBSTRING and CHARINDEX functions allow me to find the space between the first and last name and parse accordingly. For my testing I executed the above DDL in tempdb. Creating the PackageNow that we have all the pieces in place we can create the package. Start a new package using the instructions in the DTS: A Brief Introduction section, and complete the following:
At this point you can test the task by right-clicking on it and selecting Execute Step. The file will be moved to the destination directory. The next step is to create the Transform Data Task that transfers the data from the text file into the ContactsImport table. Before you can create this task, you must define the source/destination connections. Create the source connection by completing the following:
Now we need to create the connection for the destination. Complete the following to do this.
The source/destination connections are defined, so let's create the Transform Data Task with the following:
When you have multiple tasks within a package they will execute in parallel unless you define the order of execution. In this case we want the File Transfer Protocol Task to execute first, and if that is successful we want to execute the Transform Data Task. We can ensure this happens by completing the following:
The last steps in this process are to archive the source file and scrub the data. We do this by executing the pr_ContactsImport procedure with an Execute SQL Task. Create this task by completing the following:
Make sure that it executes after the Transform Data Task by creating an On Success Workflow like we have done twice before. The package is complete and you can execute it by clicking Package from the menu and selecting Execute. You should also be aware that you can schedule the package to execute automatically. Simply right-click on the package in the Local Packages view and select Schedule Package. For my project I knew that the source files would be uploaded no later than 10 AM, so I scheduled accordingly. After the data is loaded an email is automatically sent that informs the relevant personnel that the process completed successfully. Final ThoughtsDTS is a great tool. If you have never worked with it I strongly encourage you start experimenting with different methods of transferring/transforming data. I am sure having an understanding of the functionality it supports will benefit you in the long run.
Garth
|
- Advertisement - |