Preface

In the last few years, we have seen spectacular growth in the field of data science. Almost every day there is some kind of new development, for example, a research paper announcing a new or improved machine learning or deep learning algorithm, or a new library for one of the most popular programming languages.

In the past, many of those advances did not make it to mainstream media. But that is also changing rapidly. Some of the recent examples include the AlphaGo program beating the 18-time world champion at Go, using deep learning to generate realistic faces of humans that never existed, or the beautiful digital art created from a text caption using models such as DALL-E 2 or Stable Diffusion.

Another example of recent and spectacular development is OpenAI’s ChatGPT. It is a language model with which we can engage in natural-sounding conversations. The model is able to keep track of past questions and follow up on them, admit its mistakes, or reject inappropriate requests. What is more, it is not only restricted to natural language, we can ask it to write actual code snippets in various programming languages.

Aside from those newsworthy achievements, in the last decades AI has been adopted in virtually every industry. We can see it all around us, for example, the recommendations we get on Netflix or the emails we receive about an extra discount from an online shop that we have not used recently. As such, businesses all over the world employ AI to gain a competitive edge in the following ways:

  • Making better, data-driven decisions
  • Increasing their profits by efficient targeting or spot-on recommendations
  • Reducing customer churn by early identification of customers at risk
  • Automating repetitive tasks that AI can complete much faster (and potentially more accurately) than a human employee

The very same AI revolution is affecting the financial industry. In a 2020 article, Forbes reported that “70% of all financial services firms are using machine learning to predict cash flow events, fine-tune credit scores and detect fraud”. Additionally, various aspects of data science are also used for algorithmic trading, robo-advisory services, personalized banking, process automation, and more.

This book presents a recipe-based guide on how to solve various tasks within the financial domain using modern Python libraries. As such, we try to reduce the amount of code that needs to be written by leveraging mature and “battle-tested” libraries used by professionals in many industries. While the book assumes some prior knowledge and does not explain all the concepts from the theoretical point of view, it provides relevant references that allow the readers to dive deeper into the topics.

In this preface, you will find an outline of what you can expect from the book, how the content is organized, and what you need to meet your goals while having hands-on fun on the way. I hope you will enjoy it!

Who this book is for

This book is intended for data analysts, financial analysts, data scientists, or ML engineers who want to learn how to implement a broad range of tasks in a financial context. The book assumes that the readers have some understanding of financial markets and trading strategies. They should also be comfortable with using Python and its popular libraries oriented towards data science (for example, pandas, numpy, and scikit-learn).

The book will help readers to correctly use advanced approaches to data analysis within the financial domain, avoid potential pitfalls and common mistakes, and reach correct conclusions for the problems they might be trying to solve. Additionally, as the data science and financial fields are dynamically changing and expanding, the book contains references to academic papers and other relevant resources to broaden the understanding of the covered topics.

What this book covers

Chapter 1, Acquiring Financial Data, covers a few of the most popular sources of high-quality financial data, including Yahoo Finance, Nasdaq Data Link, Intrinio, and Alpha Vantage. It focuses on leveraging dedicated Python libraries and processing data for further analysis.

Chapter 2, Data Preprocessing, describes various techniques used to preprocess data. It describes the crucial steps between obtaining the data and using it for building machine learning models or investigating trading strategies. As such, it covers topics such as converting prices to returns, adjusting them for inflation, imputing missing values, or aggregating trade data into various kinds of bars.

Chapter 3, Visualizing Financial Time Series, focuses on visualizing financial (and not only) time series data. By plotting the data, we can visually identify some patterns, such as trends, seasonality, and changepoints, which we can further confirm using statistical tests. The insights gathered at this point can lead to making better decisions while choosing the modeling approach.

Chapter 4, Exploring Financial Time Series Data, shows how to use various algorithms and statistical tests to automatically identify potential issues with time series data, such as the existence of outliers. Additionally, it covers analyzing data for the existence of trends or other patterns such as mean-reversion. Lastly, it explores the stylized facts of asset returns. Together, those concepts are crucial while working with financial data, as we want to make sure that the models/strategies we are building can accurately capture the dynamics of asset returns.

Chapter 5, Technical Analysis and Building Interactive Dashboards, explains the basics of technical analysis in Python by showing how to calculate some of the most popular indicators and automatically recognize patterns in candlestick data. It also demonstrates how to create a Streamlit-based web app, which enables us to visualize and inspect the predefined TA indicators in an interactive fashion.

Chapter 6, Time Series Analysis and Forecasting, introduces the basics of time series modeling. It starts by looking into the building blocks of time series and how to separate them using various decomposition methods. Then, it covers the concept of stationarity, how to test for it, and how to achieve it in case the original series is not stationary. Lastly, it shows how to use two of the most widely used statistical approaches to time series modeling—the exponential smoothing methods and ARIMA class models.

Chapter 7, Machine Learning-Based Approaches to Time Series Forecasting, starts by explaining different ways of validating time series models. Then, it provides an overview of feature engineering approaches. It also introduces a tool for automatic feature extraction which generates hundreds or thousands of features with a few lines of code. Furthermore, it explains the concept of reduced regression and how to use Meta’s popular Prophet algorithm. The chapter concludes with an introduction to one of the popular AutoML frameworks for time series forecasting.

Chapter 8, Multi-Factor Models, covers estimating various factor models, starting with the simplest one-factor model (CAPM) and then extending it to the more advanced three-, four-, and five-factor models.

Chapter 9, Modeling Volatility with GARCH Class Models, focuses on volatility and the concept of conditional heteroskedasticity. It shows how to use univariate and multivariate GARCH models, which are one of the most popular ways of modeling and forecasting volatility.

Chapter 10, Monte Carlo Simulations in Finance, explains how to use Monte Carlo methods for various tasks, such as simulating stock prices, pricing derivatives with no closed-form solution (American/Exotic options), or estimating the uncertainty of a portfolio (for example, by calculating Value-at-Risk and Expected Shortfall).

Chapter 11, Asset Allocation, starts by explaining the most basic asset allocation strategy, and on its basis, showing how to evaluate the performance of portfolios. Then it shows three different approaches to obtaining the efficient frontier. Lastly, it explores Hierarchical Risk Parity, which is a novel approach to asset allocation based on the combination of graph theory and machine learning.

Chapter 12, Backtesting Trading Strategies, presents how to run backtests of various trading strategies using two approaches (vectorized and event-driven) with the help of popular Python libraries. To do so, it uses a few examples of strategies built on the basis of popular technical indicators or mean-variance portfolio optimization.

Chapter 13, Applied Machine Learning: Identifying Credit Default, shows how to approach a real-life machine learning task of predicting loan defaults. It covers the entire scope of a machine learning project, from gathering and cleaning data to building and tuning a classifier. An important takeaway from this chapter is understanding the general approach to machine learning projects, which can then be applied to many different tasks, be it churn prediction or estimating the price of new real estate in a neighborhood.

Chapter 14, Advanced Concepts for Machine Learning Projects, continues from the workflow introduced in the preceding chapter and demonstrates possible extensions to the MVP stage of ML projects. It starts with presenting more advanced classifiers. Then, it covers alternative approaches to encoding categorical features and describes a few methods of dealing with imbalanced data.

Furthermore, it shows how to create stacked ensembles of ML models and leverage Bayesian hyperparameter tuning to improve upon exhaustive grid search. It also explores various approaches to calculating feature importance and using it to select the most informative predictors. Lastly, it touches upon the rapidly developing field of explainable AI.

Chapter 15, Deep Learning in Finance, describes how to apply some of the recent neural network architectures to two possible use cases in the financial domain—predicting credit card default (a classification task) and forecasting time series.

To get the most out of this book

In this book, we attempt to give the readers a high-level overview of various techniques used in the financial domain, while focusing on the practical applications of these methods. That is why we put special emphasis on showing how to use various popular Python libraries to make the work of an analyst or data scientist much easier and less prone to errors.

As the best way to learn anything is by doing, we highly encourage the readers to experiment with the code samples provided (the code can be found in the accompanying GitHub repository), apply the techniques to different datasets, and explore possible extensions (some of them mentioned in the See also sections of the recipes).

For a deeper dive into the theoretical foundations, we provide references for further reading. Those also include even more advanced techniques that are outside of the scope of this book.

Download the example code files

The code bundle for the book is hosted on GitHub at https://github.com/PacktPublishing/Python-for-Finance-Cookbook-2E. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://packt.link/JnpTe.

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in text, names of Python libraries, database table names, folder names, filenames, file extensions, and pathnames. For example: “We can also use the get_by_id function to download a particular CPI series.”

A block of code is set as follows:

def realized_volatility(x): 
    return np.sqrt(np.sum(x**2))

Any command-line input or output is written as follows:

Downloaded 2769 rows of data.

Bold: Indicates a new term or an important word. For example: “Volume bars are an attempt at overcoming this problem “

Information boxes appear like this.

Tips and tricks appear like this.

Furthermore, at the very beginning of each Jupyter Notebook (available on the book’s GitHub repository), we run a few cells that import and set up plotting with matplotlib. For brevity’s sake, we will not mention this later on in the book. So at any time, assume that the following commands were executed.

First, we (optionally) increased the resolution of the generated figures using the following snippet:

%config InlineBackend.figure_format = "retina"

Then we execute the second snippet:

import matplotlib.pyplot as plt
import seaborn as sns
 
import warnings
from pandas.core.common import SettingWithCopyWarning
warnings.simplefilter(action="ignore", category=FutureWarning)
warnings.simplefilter(action="ignore", category=SettingWithCopyWarning)
 
# feel free to modify, for example, change the context to "notebook"
sns.set_theme(context="talk", style="whitegrid",
              palette="colorblind", color_codes=True,
              rc={"figure.figsize": [12, 8]})

In this cell, we import matplotlib, warnings, and seaborn. Then, we disabled some of the warnings and set up the style of the plots. In some chapters, we might modify these settings for better readability of the figures (especially in black and white).

Get in touch

Feedback from our readers is always welcome.

General feedback: Email [email protected] and mention the book’s title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you reported this to us. Please visit http://www.packtpub.com/submit-errata, click Submit Errata, and fill in the form.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit http://authors.packtpub.com.

Share your thoughts

Once you’ve read Python for Finance Cookbook - Second Edition, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.

Download a free PDF copy of this book

Thanks for purchasing this book!

Do you like to read on the go but are unable to carry your print books everywhere?

Is your eBook purchase not compatible with the device of your choice?

Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.

Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.

The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily

Follow these simple steps to get the benefits:

  1. Scan the QR code or visit the link below

https://packt.link/free-ebook/9781803243191

  1. Submit your proof of purchase
  2. That’s it! We’ll send your free PDF and other benefits to your email directly
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset