Fast CSV Import in PowerShell to SQL Server

By Bill Graziano on 18 March 2014 | Tags: INSERT , bcp/Bulk Insert


I recently had to write some simple scripts for a proof of concept. These scripts use PowerShell to import and export CSV files from SQL Server. The scripts were simple to write and maintain but were slow. It turns out there are a few ways to improve CSV import and export performance in PowerShell.

Before I landed on PowerShell I considered a few other approaches to get this done.

  • SSIS. This gave me all the performance and flexibility around CSV formats that I wanted. But this would have been their first SSIS package and much of the functionality would have been embedded in those SSIS packages. I was looking for something that was a more explicit to read.
  • SQL Server BULK INSERT or BCP. This was more script-able but getting the format file right proved to be a challenge. The import file included quotes around the values but only if there was a comma inside the string.

Simple CSV Import using PowerShell

The dirt simplest way to import a CSV file into SQL Server using PowerShell looks like this:

$ConnectionString = "Data Source=L80\SQL2012; Database=tempdb; Trusted_Connection=True;";
. ".\Out-DataTable.ps1"

$csvDataTable = Import-CSV -Path ".\SimpleCsv.txt" | Out-DataTable
$bulkCopy = new-object ("Data.SqlClient.SqlBulkCopy") $ConnectionString
$bulkCopy.DestinationTableName = "CsvDataReader"
$bulkCopy.WriteToServer($csvDataTable)

The Import-CSV command is handy and flexible. It handled headers rows and optional quotes around columns without problem. This does require the use of Chad Miller's excellent Out-Table script to convert the CSV to a DataTable. That's because the WriteToServer method only accepts DataTables or DataReaders as data sources.

Unfortunately that means it has to pull the entire CSV into memory to build the DataTable. On a 100MB file I was seeing memory sizes over 500MB. It was also taking over 15 minutes to process the file.

Fast CSV Import using PowerShell

Processing this faster means getting the CSV into an IDataReader. That streams the file through memory rather than completely loading it in. I found a few people had written either partial or full classes to handle that. None of them quite did what I needed though. So I coded up a simple DLL to expose a CSV as an IDataReader. Using that in PowerShell looks like this:

[System.Reflection.Assembly]::LoadFrom(".\CsvDataReader.dll") | Out-Null
$reader = New-Object SqlUtilities.CsvDataReader(".\SimpleCsv.txt")

$bulkCopy = new-object ("Data.SqlClient.SqlBulkCopy") $ConnectionString
$bulkCopy.DestinationTableName = "CsvDataReader"

$bulkCopy.WriteToServer($reader)

You can find the fast DataReader for CSV files over on GitHub. You can download just the DLL if you'd like. After you download the DLL don't forget to Unblock it in the File Properties dialog box. It also handles optional quotes around values. It's written to support bulk insert to SQL Server so not all the methods are implemented yet. The column names are currently case-sensitive if you need to do custom field mapping.

This reduced my 15 minute load on the 100MB to 45 seconds.

Simple CSV Export using PowerShell

I also needed to export some CSV files from SQL Server. The simplest PowerShell script to complete that is:

Import-Module “sqlps” -DisableNameChecking
invoke-sqlcmd -query "SELECT * FROM CsvImport" -database Test -serverinstance "L80\SQL2012" | 
    Export-CSV -Path "C:\SimpleCsvOut.txt" -NoTypeInformation }

The Export-CSV command certainly makes this simple. A few things to note about this though:

  • This wasn't terribly slow. It exported 26,000 rows in about 45 seconds. But I thought it could be faster. It will also get worse the more rows you export.
  • It did put quotes around every single value. Including headers.
  • After I imported the SQLPS module I needed to hard-code all the path names. I'm sure there's way around that but I'd moved on after seeing the first two issues.

Faster CSV Export using PowerShell

Rewriting the export to stream a SqlDataReader to StreamWriter looked like this:

$streamWriter = New-Object System.IO.StreamWriter ".\SimpleCsvOut3.txt"
$sqlConn = New-Object System.Data.SqlClient.SqlConnection $ConnectionString
$sqlCmd = New-Object System.Data.SqlClient.SqlCommand
$sqlCmd.Connection = $sqlConn
$sqlCmd.CommandText = "SELECT * FROM Test.dbo.CsvImport"
$sqlConn.Open();
$reader = $sqlCmd.ExecuteReader();

# Initialze the arry the hold the values
$array = @()
for ( $i = 0 ; $i -lt $reader.FieldCount; $i++ ) 
    { $array += @($i) }

# Write Header
$streamWriter.Write($reader.GetName(0))
for ( $i = 1; $i -lt $reader.FieldCount; $i ++) 
{ $streamWriter.Write($("," + $reader.GetName($i))) }

$streamWriter.WriteLine("") # Close the header line

while ($reader.Read())
{
    # get the values;
    $fieldCount = $reader.GetValues($array);

    # add quotes if the values have a comma
    for ($i = 0; $i -lt $array.Length; $i++)
    {
        if ($array[$i].ToString().Contains(","))
        {
            $array[$i] = '"' + $array[$i].ToString() + '"';
        }
    }

    $newRow = [string]::Join(",", $array);

    $streamWriter.WriteLine($newRow)
}
$reader.Close();
$sqlConn.Close();
$streamWriter.Close();

It certainly was a lot more code to write. Most of it is boilerplate and easy to modify when you copy it. The performance only decreased from 45 seconds to 25 seconds on 26,000 rows. If you eliminate the code to wrap quotes around commas it cuts the time from 25 seconds to 12 seconds. I also saw bigger differences on larger data sets.

In summary, PowerShell is a great way to handle CSV files in SQL Server for smaller files. And with a little tweaking it can handle much larger files too.


Related Articles

How to Insert Values into an Identity Column in SQL Server (6 August 2007)

Using Views to Enforce Business Rules (9 April 2007)

Using the OUTPUT Clause to Capture Identity Values on Multi-Row Inserts (14 August 2006)

Anticipating Primary Key Violations (23 November 2003)

Using EXISTS (12 October 2003)

Exporting Data Programatically with bcp and xp_cmdshell (1 July 2001)

Using BULK INSERT to Load a Text File (19 March 2001)

Using SELECT to INSERT records (15 August 2000)

Other Recent Forum Posts

Summarizing NPS Data by Customer & Group (20h)

Using the CASE Clause with a date range for just Month and DAy (1d)

SQL problem to be solved (8d)

Align column in SQL Query output (9d)

I have a sql query which returns a sum of total amount by user and I am trying to add another column which uses the total and multiply it by a number (10d)

Troubleshooting Deadlocks in SQL Server (19d)

Last Login date and time (21d)

Negative effects of High VLF counts (21d)

- Advertisement -