Brief overview and best practices on how to take an Excel (.xlsx) file and convert it into a comma delimited values (CSV or .csv) file. This is useful if you’re trying to upload files to a database or other systems that do not support .xlsx or other files. You can simply convert them to .csv so the system can ingest them in a compatible format.

This tutorial will cover how-to execute this and some basic best practices.

Though this is pretty basic IT knowledge, it can be used as a useful How-To for beginners and/or used as a living document you can refer people to. Lastly, even though it’s basic file conversion, there’s some tips on what to watch out for when converting from Excel to CSV.

Step 1: Start with a Saved or Working Excel File

So we can work from the same file at the outset and control for potential outliers/corruptions, I’m uploading this .xlsx file. It contains my top 10 most misused or non-existent words that one should ever fall victim to.

It’s only ten rows across two columns so it’s not a lot of data. You don’t have to use this if you don’t want to, but if you want a test file, this one will work just fine.

Regardless, the steps below will work for any standard Excel doc.

b6a4e7c8c5ab20d4b925e5606a9f4309136882dcd9b6f3bfe03bf285b497c8cc_example_file_to_convert.xlsx (8.7 KB)

Step 2: Click “File” on the Ribbon

Once you have the file above (or other Excel file) open, go to the “File” tab on the uppermost ribbon. Unless configured otherwise it is usually in the upper lefthand corner of an open Excel document.

Once you have the dropdown menu open after hovering over “File”, click “Save As” from the tab’s dropdown menu. A pop-up menu will appear once you have done so. The attached image is a version of what this would look like on a Mac. PC’s will be very similar.

Step 3: Save the Excel (.xlsx) File into a CSV (.csv)

In the form of a “File Format” option the menu will ask you what type of file you want to save your current .xlsx into. The full list of conversion options will vary by the type of operating system you have, but there’s a few common types that are pretty consistent across the board.

The highlighted option in the attached image is the most general of .csv file options. Others (e.g. MS-DOS) are more specific in their use-case. Unless you have a known reason for picking another variety, it is advised to pick the .csv option that has no explicit verbiage in the naming convention as shown in the image.

Author’s Note:
You’ll notice that the photo shows me saving this to my applications file. This is not an advisable location for storing files of this type. I am only doing this so that I’m not showing you other files. Choose the file location that makes the most sense for your use-case.

Step 4: Best Practices and Notes

Once you’ve selected your file type and location you are technically done because your file has been converted, but there’s a few “gotchas” to consider when making conversions.

  1. .csv does not support multiple tabs like .xslx does (basic conversion consideration)

If you have an Excel file with multiple tabs, you’ll need to convert each tab to it’s own .xlsx file and convert each to a .csv separately. Typically, you’ll get a warning of potential data loss if you’re using the “Save As” wizard, but the error message is not always explicit as to what you’re going to lose.

  1. The default file opener for .csv files in most systems is Excel (slightly more advanced consideration)

This is because Excel is smart enough to break files into rows and columns better than most (if not all user-friendly) programs. However, it is not the only way these types of files can be opened. In fact, if you’re having trouble importing a .csv into a database, I would advise that you open it up in your favorite text editor instead. That way, you can see what the root cause of a potential error is because Excel will often mask these errors making it look like nothing is wrong because it is smart enough to correct these errors behind the covers.

The image attached to this step shows the example file in .csv format. Wherever you put the file you converted in previous steps, right click it, go to “Open With”, and you’ll see your default program as well as a list of other options. If you open the file with a text editor (e.g. Notepad, Notepad++, TextEdit, Sublime, Atom, etc…), you’ll be able to view the file as the “computer sees it”. This will help you navigate upload errors in a database.

  1. File header best practice (advanced consideration)

In the file in step one, you’ll notice that the second column, column B, has spaces in its header. This is not advised if you want to upload a .csv file into another system and carry the headers over. This is because it can cause two potential problems in a database:

a) If it uploads with a header, you will have to wrap what is now your column name in double quotes every time you call it in a query. It’s not an impediment to uploading all the time, but it can be really annoying if you have to write it that way over and over again. Also, it will prevent auto-complete if your database program supports it.

b) It could cause an upload error in some systems because they’re trying to prevent the troubles in “a” above, and it could prevent you from uploading the file entirely if you do not remove or fix the header.

To prevent this, simply snake_case or CamelCase the header so there are no spaces in its name.

Whew!

I know that is a lot of text for a seemingly simple exercise, but I cannot tell you how many times I’ve either had to (a) teach someone how to simply convert a file to a CSV, or (b) had issues uploading a file into a database due to elementary issues.

There’s a lot more to this including, non UTF-8 supported characters, data types, casting, fixing .csv upload errors through vim or bash editing… All topics for another time. For now, that’s a wrap.

Hope you found this helpful!

73 Spice ups

Surprising how many end users literally need hand holding over the simplest things.

One note I would like to add, if you’re building an application that uploads a .csv, and a .csv only, this conversion won’t matter on a Windows machine because of how that OS handles file conversions. In short, Windows handles MIME types in ways that make uploading .csv files difficult because you cannot truly convert them out of .xlsx.

Explanation of the problem in a GitHub project:

The tools are starting to get better and handling file types more expertly
Example application with file_type selection for better handling of variance:
http://oss.sheetjs.com/js-xlsx/

Even after doing this I’ve run into systems that won’t support the generated CSV file. Excel is very, very smart when it comes to CSV files, supporting line breaks and other special characters in the various fields. Most CSV file importers will fail on these.

Great post. Also for those admins that for some reason need a CSV converted to Excel I highly recommend PSExcel.

Just think back of how hard all this was 10 years ago.

Couldn’t you also use PowerShell with a “ConvertTo-Csv” cmdlet? I would think for a batch job that could be a better option.

@Devon1987

For sure you could! For batching, this would most definitely be a better option. Also, If you’re using a *nix machine, you could just use a Perl command to change all the file names with any given string.

Not really a tutorial for that, and more of the elementary variety, but you’re 100% right.

Good post. I’ve had to look up how to convert to csv a few times as I don’t often have to convert files. It’s funny how many times I run in to errors and their never the same error.

I appreciate articles like these – even simple workflows should have solid documentation, and this definitely qualifies. Nice write up.

Problem is, csv isn’t a standard.

Excel has certain default behaviour that many people consider to be a standard, but it aint…

Furicle:
There no official CSV standard, but there are common ways that CSV handle escaped fields. It’s generally accepted that if you have a field with commas in it, the field can be enclosed by double quotes. That’s how excel handles it, and that’s how most databases handle it (sometimes it has to be enabled. For example, doing a MySQL import infile command, you can specify a clause

OPTIONALLY ENCLOSED BY ‘"’

(that’s a single quote, a double quote, and a single quote, meaning you can enclose fields in double quotes. And there are further ways to escape double quotes that are generally considered as best practices, if not an outright standard. It’s only not a standard because it basically pre-dates everyone trying to patent their garbage formats and seek licensing fees.

Anyone skipped using the application and just coded for batching up a folder of files using Powershell to convert them?? TechNet had this doozy listed out there … just swap out location as needed.

$ErrorActionPreference = ‘Stop’

Function Convert-CsvInBatch
{
[CmdletBinding()]
Param
(
[Parameter(Mandatory=$true)][String]$Folder
)
$ExcelFiles = Get-ChildItem -Path $Folder -Filter *.xlsx -Recurse

$excelApp = New-Object -ComObject Excel.Application
$excelApp.DisplayAlerts = $false

$ExcelFiles | ForEach-Object {
	$workbook = $excelApp.Workbooks.Open($_.FullName)
	$csvFilePath = $_.FullName -replace "\.xlsx$", ".csv"
	$workbook.SaveAs($csvFilePath, [Microsoft.Office.Interop.Excel.XlFileFormat]::xlCSV)
	$workbook.Close()
}

# Release Excel Com Object resource
$excelApp.Workbooks.Close()
$excelApp.Visible = $true
Start-Sleep 5
$excelApp.Quit()
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($excelApp) | Out-Null

}

0. Prepare the folder path which contains all excel files

$FolderPath = “D:\var\projects\OCOS\var\excelchange”

Convert-CsvInBatch -Folder $FolderPath

There’s another article out there in a similar site with alternate code as well … Extract and convert all Excel worksheets to CSV files with PowerShell

I once (back in the 90’s) had to manipulate a “database” file containing every model of car in existence, at least every one that the company, an insurance broker, would cover. I think it was 35-50k lines of data. The original file came from a mainframe and was in a kind of loosely organised csv-style format and they needed it in Excel since that’s what the office was using. Every time I changed something and did a sort on the table, I’d start it running and then go to lunch, getting back about 1 hour later to find it had either just completed or was near to finishing. I’m glad we have better tools and processors these days. Thanks for the tips, I have had to teach this on many occasions, but still learned a little about importing csvs into other applications.

Sub QuoteCommaExport()
’ Dimension all variables.
Dim FileNum As Integer
Dim ColumnCount As Integer
Dim RowCount As Integer

Dim sFName

If Selection.Rows.Count = 1 Then
MsgBox “Select rows to export”
Exit Sub
End If

’ Force File Type during Save
sFName = Application.GetSaveAsFilename(ActiveWorkbook.Path & “textfile.csv”, “CSVfiles (*.csv), *.csv”)

If sFName = False Then
MsgBox “cancelled”
Exit Sub
End If

’ Obtain next free file handle number.
FileNum = FreeFile()

’ Turn error checking off.
On Error Resume Next

’ Attempt to open destination file for output.
Open sFName For Output As #FileNum

’ If an error occurs report it and end.
If Err <> 0 Then
MsgBox "Cannot open filename " & sFName
End
End If

’ Turn error checking on.
On Error GoTo 0

’ Loop for each row in selection.
For RowCount = 1 To Selection.Rows.Count

  ' Loop for each column in selection.
  For ColumnCount = 1 To Selection.Columns.Count

     ' Write current cell's text to file with quotation marks.
     Print #FileNum, """" & Selection.Cells(RowCount, _
        ColumnCount).Text & """";

     ' Check if cell is in last column.
     If ColumnCount = Selection.Columns.Count Then
        ' If so, then write a blank line.
        Print #FileNum,
     Else
        ' Otherwise, write a comma.
        Print #FileNum, ",";
     End If
  ' Start next iteration of ColumnCount loop.
  Next ColumnCount

’ Start next iteration of RowCount loop.
Next RowCount

’ Close destination file.
Close #FileNum
MsgBox "Exported " & RowCount
End Sub

There may not be an official standard from a standards body like ISO, but there IS a de facto standard, it’s pretty well documented here.

Line breaks are the biggest problem.

There is ONE major caveat to this. I work in a data team and customers send us files in CSV with commas as part of the data…idiots!

So i have set my system to use the PIPE icon… |

I did this by going here

CONTROL PANEL → REGION → ADDITIONAL SETTINGS → LIST SEPARATOR

And changing that to a PIPE. Now, every single CSV i create is PIPE SEPARATED not comma!

If you have changed yours then your CSVs won’t be CSVs in the truest sense.

All this PS code is great and if you have to convert several files it might even be faster, but I’ve always just opened the file and saves as CSV. There are 4 CSV formats so for maximum compatibility I usually use CSV (MS-DOS). You end up with a basic text file with no formatting.

Great article for those still learning Excel and CSV processing.

Thank you for the detailed description. I work with databases. Transferred in manual mode and then formatted. I will try on the described method.