Handling non-standard CSV encoding and dialect

Most CSV data now is encoded using the standard Unicode formats that are used by default in Python. Occasionally however, you may come across a data file with an older or more obscure encoding format. In order to properly read and process data with a non-standard encoding, you will need to specify the encoding in the call to open() function that creates the file object. The pandas.read_csv() function also allows for the specification of non-standard encoding. I've made a link to the encoding formats accepted by Python in the Links and Further Reading document in the external resources.

There also may be variations in the delimiter, the character used to separate values, the newline character used to indicate the end of a line, and a few other formatting attributes. These variations are collectively referred to as the CSV dialect. Both the pandas.read_csv() function and the csv.reader() have parameters that allow you to specify variations in the formatting attributes of the CSV file you are reading from or writing to. I've made links to the documentation for both available in the Links and Further Reading document of the external resources.

Table of Contents for Handling non-standard CSV encoding and dialect

Create new playlist

Sign In

Sign Up

Table of Contents for
Handling non-standard CSV encoding and dialect