Data sources

We will use freely available historical data from market, fundamental and alternative sources. Chapter 2, Market and Fundamental Data and Chapter 3, Alternative Data for Finance  cover characteristics and access to these data sources and introduce key providers that we will use throughout the book. The companion GitHub repository (see beneath) contains instructions on how to obtain or create some of the datasets that we will use throughout and includes some smaller datasets.

A few sample data sources that we will source and work with include, but are not limited to:

  • NASDAQ ITCH order book data
  • Electronic Data Gathering, Analysis, and Retrieval (EDGAR) SEC filings
  • Earnings call transcripts from Seeking Alpha
  • Quandl daily prices and other data points for over 3,000 US stocks
  • Various macro fundamental data from the Federal Reserve and others
  • Large Yelp business reviews and Twitter datasets
  • Image data on oil tankers

Some of the data is several GB large (e.g. the NASDAQ and SEC filings). The notebooks indicate when that is the case.

