Importing from CSV to SQL, how long?

s-laverty · 2024-02-23T15:20:20.000Z

Hey,

So I’ve got this Powershell script which imports ~5.5m lines across ~150 csv’s into SQL. The script works fine but it takes approximately 5 hours to complete which seems crazy long to me. Wondering if anyone has some example times for importing from CSV to SQL via Powershell I can compare to? This is the first time I’ve done something like this.

The data itself is a mixture of booleans, decimal and strings, some strings are quite long.

The relevant section of the script which does the importing is as follows in case anyone can see something which might help with times. Most of the delay seems to come from importing the CSV to memory prior to the write-sqltabledata command, once it gets to write-sqltabledata that part completes fairly quick.

"Getting row count for table $table"
    $CSVRowCount = Import-CSV $DataCSV | Measure-Object | Select -ExpandProperty Count

    "Deleting database table $table"
    If ($TableExists) {
        $DeleteDB = "USE [$SQLDatabase]
                GO
                IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[$table]') AND type in (N'U'))
                DROP TABLE [dbo].[$table]
                GO"
        Invoke-SQLCmd -Query $DeleteDB -ServerInstance $SQLInstance -Username $SQLUsername -Password $SQLPassword
    }

    ### Performing batch import from CSV to SQL
    $SQLImportBatch = 50000
    $LoopsReq = [math]::ceiling($CSVRowCount / $SQLImportBatch)
    For ($i=0; $i -lt $LoopsReq; $i++) {
        ,(Import-Csv -Path $DataCSV | Select -Skip ([Int]($i * $SQLImportBatch)) -First $SQLImportBatch) | Write-SqlTableData -ServerInstance $SQLInstance -DatabaseName $SQLDatabase -SchemaName "dbo" -TableName $Table -Force -ConnectionTimeout 14400 -Timeout 0
    }

Server this runs on is running off spinning disks which I suspect is a large contributer to the slowness. Other than that, CPU sits around 20-40%, memory about 75-85%.

Cheers!

Rod-IT · 2024-02-23T15:40:04.000Z

It likely will be IO as you noted, as it has to read it in, convert it and write it to the SQL tables, this will be a lot of little IO which are not great for performance.

Large sequential writes are best, little random ones are not, for any media, but HDD will be the worst.

I can’t add anything about the import process or time, but I do think you are right with the issue being the HDDs.

dannyh2 · 2024-02-23T16:41:38.000Z

Your script is looping one line at a time and most likely calling the INSERT INTO statement 5 million times. A better option is to use the BULK INSERT command as mentioned on BULK INSERT (Transact-SQL) - SQL Server | Microsoft Learn

dannyh2 · 2024-02-23T16:42:39.000Z

Another option is to use the BCP utility. Check Bulk Import and Export of Data (SQL Server) - SQL Server | Microsoft Learn

s-laverty · 2024-02-23T17:05:13.000Z

Oo, didn’t know about the BCP utility, will need to give that a try! With regards to looping one line at a time though, I thought the purpose of proceeding the data gathering command (eg import-csv) with a comma and enclosing it in brackets and then piping it was to bulk send that data to the write-sqltabledata command? If I exclude the comma and brackets the script takes waaaay longer to complete

nhawe · 2024-02-23T17:40:21.000Z

BCP is the good way to load csv, however you can also use the loading function on SSMS tool. It won’t be fast as BCP but should not take 5 hours

d16f2b09-7e94-472a-9e83-c099dacb94db-Capture1.PNG800×728 168 KB

s-laverty · 2024-02-28T18:02:10.000Z

Well I’ve had quite the journey figuring out BCP. Ran some quick tests:
Moving from HDDs to SSDs resulted in ~50% shorter run time
Using BCP over existing method resulted in ~40% shorter runtime
And just for fun of it, removing the comma and brackets from the import-csv statement resulted in ~16000% longer runtime
So BCP is definitely the way to go. Unfortunately the data I’m working with is such a mess I don’t think I’ll be able to use it in this instance. Still working through all the problems I’m encountering but current one that has me stumped is some of the date data is in format “DD/MM/yyyy HH:mm:ss” and neither BCP or SQL merge is happy with this.

nhawe · 2024-02-28T18:51:56.000Z

You can load it to Varchar or Char column as a temp/staging table, then convert to datetime using sql function.

s-laverty · 2024-02-29T07:41:12.000Z

Yeah I’m already bringing the data into a staging table I just haven’t figured out a way to incorporate that conversion into my merge statement so it can progmatically convert it over the ~150 tables when required without having to individually skim each line of each table looking for that date format, or without me having to check each table for that value and implement an alternative merge statement for each one.

MaryJ2 · 2025-03-19T11:25:21.093Z

To speed up the process, consider using BULK INSERT or OPENROWSET(BULK…) instead of Write-SqlTableData, as they handle large datasets more efficiently. If PowerShell must be used, avoid Import-Csv and use StreamReader to read and insert data line by line, reducing memory overhead.

There is also an alternative, like Skyvia. It has efficient bulk CSV imports into SQL Server with batch processing to handle large datasets faster.

craigrrr · 2025-03-19T11:37:18.035Z

I’ve found that what really slows down large operations is whether the chunks you are handling are large enough exceed the buffer cache-ing to handle all of the data that is currently being processed. When they do, it has to juggle data in and out of working memory and can really slow down an operation. I’d be curious if you lowered your value for $SQLImportBatch from 50000 down to like 10000 or even lower if it didn’t go a lot quicker.

s-laverty · 2025-03-25T14:10:09.546Z

@craigrrr you’re right, reducing the batch count does improve the times. I did run a bunch of tests and found somewhere around 2000 iirc was quickest, lowering from here started to increase the time. Unfortunately I’ve lost my notes on this now.

Ultimately what I implemented to improve the script drastically was delta updates. A lot of the tables had a last modified field and I confirmed with the database maintainer that deleted entries don’t actually get deleted from the DB but rather just get a deleted flag assigned to them so this allowed me to simply import any changed or new lines safe in the knowledge I wouldn’t be missing any removals.