Reading feather files

The feather format is a binary file format for storing data that makes use of Apache Arrow, an in-memory columnar data structure. It was developed by Wes Mckinney and Hadley Wickham, chief scientists at RStudio as an initiative for a data sharing infrastructure across Python and R. The columnar serialization of data in feather files makes way for efficient read and write operations, making it far faster than CSV and JSON files where storage is record-wise.

Feather files have the following features:

  • Fast I/O operations.
  • Feather files can be read and written in languages other than R or Python, such as Julia and Scala.
  • They have compatibility with all pandas datatypes, such as Datetime and Categorical.

Feather currently supports the following datatypes:

  • All numeric datatypes
  • Logical
  • Timestamps
  • Categorical
  • UTF-8 encoded strings
  • Binary

Since feather is merely a simplistic version of Arrow, it has several caveats associated with it. The following are some limitations of using a feather file:

  • Not recommended for long-term data storage as their stability between versions cannot be guaranteed.
  • Any index or multi-index, other than the default indexing scheme, is not supported in Feather format.
  • Python data types such as Period are not supported.
  • Duplicates in column names are not supported.

Reading a feather file in pandas is done like so:

pd.read_feather("sample.feather")

This results in the following output:

Output of read_feather
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset