Title Page Copyright and Credits Hands-On Machine Learning for Algorithmic Trading About Packt Why subscribe? Packt.com Contributors About the author About the reviewers Packt is searching for authors like you Preface Who this book is for What this book covers To get the most out of this book Download the example code files Download the color images Conventions used Get in touch Reviews Machine Learning for Trading How to read this book What to expect Who should read this book How the book is organized Part 1 – the framework – from data to strategy design Part 2 – ML fundamentals Part 3 – natural language processing Part 4 – deep and reinforcement learning What you need to succeed Data sources GitHub repository Python libraries The rise of ML in the investment industry From electronic to high-frequency trading Factor investing and smart beta funds Algorithmic pioneers outperform humans at scale ML driven funds attract $1 trillion AUM The emergence of quantamental funds Investments in strategic capabilities ML and alternative data Crowdsourcing of trading algorithms Design and execution of a trading strategy Sourcing and managing data Alpha factor research and evaluation Portfolio optimization and risk management Strategy backtesting ML and algorithmic trading strategies Use Cases of ML for Trading Data mining for feature extraction Supervised learning for alpha factor creation and aggregation Asset allocation Testing trade ideas Reinforcement learning Summary Market and Fundamental Data How to work with market data Market microstructure Marketplaces Types of orders Working with order book data The FIX protocol Nasdaq TotalView-ITCH Order Book data Parsing binary ITCH messages Reconstructing trades and the order book Regularizing tick data Tick bars Time bars Volume bars Dollar bars API access to market data Remote data access using pandas Reading html tables pandas-datareader for market data The Investor Exchange Quantopian Zipline Quandl Other market-data providers How to work with fundamental data Financial statement data Automated processing – XBRL Building a fundamental data time series Extracting the financial statements and notes dataset Retrieving all quarterly Apple filings Building a price/earnings time series Other fundamental data sources pandas_datareader – macro and industry data Efficient data storage with pandas Summary Alternative Data for Finance The alternative data revolution Sources of alternative data Individuals Business processes Sensors Satellites Geolocation data Evaluating alternative datasets Evaluation criteria Quality of the signal content Asset classes Investment style Risk premiums Alpha content and quality Quality of the data Legal and reputational risks Exclusivity Time horizon Frequency Reliability Technical aspects Latency Format The market for alternative data Data providers and use cases Social sentiment data Dataminr StockTwits RavenPack Satellite data Geolocation data Email receipt data Working with alternative data Scraping OpenTable data Extracting data from HTML using requests and BeautifulSoup Introducing Selenium – using browser automation Building a dataset of restaurant bookings One step further – Scrapy and splash Earnings call transcripts Parsing HTML using regular expressions Summary Alpha Factor Research Engineering alpha factors Important factor categories Momentum and sentiment factors Rationale Key metrics Value factors Rationale Key metrics Volatility and size factors Rationale Key metrics Quality factors Rationale Key metrics How to transform data into factors Useful pandas and NumPy methods Loading the data Resampling from daily to monthly frequency Computing momentum factors Using lagged returns and different holding periods Compute factor betas Built-in Quantopian factors TA-Lib Seeking signals – how to use zipline The architecture – event-driven trading simulation A single alpha factor from market data Combining factors from diverse data sources Separating signal and noise – how to use alphalens Creating forward returns and factor quantiles Predictive performance by factor quantiles The information coefficient Factor turnover Alpha factor resources Alternative algorithmic trading libraries Summary Strategy Evaluation How to build and test a portfolio with zipline Scheduled trading and portfolio rebalancing How to measure performance with pyfolio The Sharpe ratio The fundamental law of active management In and out-of-sample performance with pyfolio Getting pyfolio input from alphalens Getting pyfolio input from a zipline backtest Walk-forward testing out-of-sample returns Summary performance statistics Drawdown periods and factor exposure Modeling event risk How to avoid the pitfalls of backtesting Data challenges Look-ahead bias Survivorship bias Outlier control Unrepresentative period Implementation issues Mark-to-market performance Trading costs Timing of trades Data-snooping and backtest-overfitting The minimum backtest length and the deflated SR Optimal stopping for backtests How to manage portfolio risk and return Mean-variance optimization How it works The efficient frontier in Python Challenges and shortcomings Alternatives to mean-variance optimization The 1/n portfolio The minimum-variance portfolio Global Portfolio Optimization - The Black-Litterman approach How to size your bets – the Kelly rule The optimal size of a bet Optimal investment – single asset Optimal investment – multiple assets Risk parity Risk factor investment Hierarchical risk parity Summary The Machine Learning Process Learning from data Supervised learning Unsupervised learning Applications Cluster algorithms Dimensionality reduction Reinforcement learning The machine learning workflow Basic walkthrough – k-nearest neighbors Frame the problem – goals and metrics Prediction versus inference Causal inference Regression problems Classification problems Receiver operating characteristics and the area under the curve Precision-recall curves Collecting and preparing the data Explore, extract, and engineer features Using information theory to evaluate features Selecting an ML algorithm Design and tune the model The bias-variance trade-off Underfitting versus overfitting Managing the trade-off Learning curves How to use cross-validation for model selection How to implement cross-validation in Python Basic train-test split Cross-validation Using a hold-out test set KFold iterator Leave-one-out CV Leave-P-Out CV ShuffleSplit Parameter tuning with scikit-learn Validation curves with yellowbricks Learning curves Parameter tuning using GridSearchCV and pipeline Challenges with cross-validation in finance Time series cross-validation with sklearn Purging, embargoing, and combinatorial CV Summary Linear Models Linear regression for inference and prediction The multiple linear regression model How to formulate the model How to train the model Least squares Maximum likelihood estimation Gradient descent The Gauss—Markov theorem How to conduct statistical inference How to diagnose and remedy problems Goodness of fit Heteroskedasticity Serial correlation Multicollinearity How to run linear regression in practice OLS with statsmodels Stochastic gradient descent with sklearn How to build a linear factor model From the CAPM to the Fama—French five-factor model Obtaining the risk factors Fama—Macbeth regression Shrinkage methods: regularization for linear regression How to hedge against overfitting How ridge regression works How lasso regression works How to use linear regression to predict returns Prepare the data Universe creation and time horizon Target return computation Alpha factor selection and transformation Data cleaning – missing data Data exploration Dummy encoding of categorical variables Creating forward returns Linear OLS regression using statsmodels Diagnostic statistics Linear OLS regression using sklearn Custom time series cross-validation Select features and target Cross-validating the model Test results – information coefficient and RMSE Ridge regression using sklearn Tuning the regularization parameters using cross-validation Cross-validation results and ridge coefficient paths Top 10 coefficients Lasso regression using sklearn Cross-validated information coefficient and Lasso Path Linear classification The logistic regression model Objective function The logistic function Maximum likelihood estimation How to conduct inference with statsmodels How to use logistic regression for prediction How to predict price movements using sklearn Summary Time Series Models Analytical tools for diagnostics and feature extraction How to decompose time series patterns How to compute rolling window statistics Moving averages and exponential smoothing How to measure autocorrelation How to diagnose and achieve stationarity Time series transformations How to diagnose and address unit roots Unit root tests How to apply time series transformations Univariate time series models How to build autoregressive models How to identify the number of lags How to diagnose model fit How to build moving average models How to identify the number of lags The relationship between AR and MA models How to build ARIMA models and extensions How to identify the number of AR and MA terms Adding features – ARMAX Adding seasonal differencing – SARIMAX How to forecast macro fundamentals How to use time series models to forecast volatility The autoregressive conditional heteroskedasticity (ARCH) model Generalizing ARCH – the GARCH model Selecting the lag order How to build a volatility-forecasting model Multivariate time series models Systems of equations The vector autoregressive (VAR) model How to use the VAR model for macro fundamentals forecasts Cointegration – time series with a common trend Testing for cointegration How to use cointegration for a pairs-trading strategy Summary Bayesian Machine Learning How Bayesian machine learning works How to update assumptions from empirical evidence Exact inference: Maximum a Posteriori estimation How to select priors How to keep inference simple – conjugate priors How to dynamically estimate the probabilities of asset price moves Approximate inference: stochastic versus deterministic approaches Sampling-based stochastic inference Markov chain Monte Carlo sampling Gibbs sampling Metropolis-Hastings sampling Hamiltonian Monte Carlo – going NUTS Variational Inference Automatic Differentiation Variational Inference (ADVI) Probabilistic programming with PyMC3 Bayesian machine learning with Theano The PyMC3 workflow Model definition – Bayesian logistic regression Visualization and plate notation The Generalized Linear Models module MAP inference Approximate inference – MCMC Credible intervals Approximate inference – variational Bayes Model diagnostics Convergence Posterior Predictive Checks Prediction Practical applications Bayesian Sharpe ratio and performance comparison Model definition Performance comparison Bayesian time series models Stochastic volatility models Summary Decision Trees and Random Forests Decision trees How trees learn and apply decision rules How to use decision trees in practice How to prepare the data How to code a custom cross-validation class How to build a regression tree How to build a classification tree How to optimize for node purity How to train a classification tree How to visualize a decision tree How to evaluate decision tree predictions Feature importance Overfitting and regularization How to regularize a decision tree Decision tree pruning How to tune the hyperparameters GridsearchCV for decision trees How to inspect the tree structure Learning curves Strengths and weaknesses of decision trees Random forests Ensemble models How bagging lowers model variance Bagged decision trees How to build a random forest How to train and tune a random forest Feature importance for random forests Out-of-bag testing Pros and cons of random forests Summary Gradient Boosting Machines Adaptive boosting The AdaBoost algorithm AdaBoost with sklearn Gradient boosting machines How to train and tune GBM models Ensemble size and early stopping Shrinkage and learning rate Subsampling and stochastic gradient boosting How to use gradient boosting with sklearn How to tune parameters with GridSearchCV Parameter impact on test scores How to test on the holdout set Fast scalable GBM implementations How algorithmic innovations drive performance Second-order loss function approximation Simplified split-finding algorithms Depth-wise versus leaf-wise growth GPU-based training DART – dropout for trees Treatment of categorical features Additional features and optimizations How to use XGBoost, LightGBM, and CatBoost How to create binary data formats How to tune hyperparameters Objectives and loss functions Learning parameters Regularization Randomized grid search How to evaluate the results Cross-validation results across models How to interpret GBM results Feature importance Partial dependence plots SHapley Additive exPlanations How to summarize SHAP values by feature How to use force plots to explain a prediction How to analyze feature interaction Summary Unsupervised Learning Dimensionality reduction Linear and non-linear algorithms The curse of dimensionality Linear dimensionality reduction Principal Component Analysis Visualizing PCA in 2D The assumptions made by PCA How the PCA algorithm works PCA based on the covariance matrix PCA using Singular Value Decomposition PCA with sklearn Independent Component Analysis ICA assumptions The ICA algorithm ICA with sklearn PCA for algorithmic trading Data-driven risk factors Eigen portfolios Manifold learning t-SNE UMAP Clustering k-Means clustering Evaluating cluster quality Hierarchical clustering Visualization – dendrograms Density-based clustering DBSCAN Hierarchical DBSCAN Gaussian mixture models The expectation-maximization algorithm Hierarchical risk parity Summary Working with Text Data How to extract features from text data Challenges of NLP The NLP workflow Parsing and tokenizing text data Linguistic annotation Semantic annotation Labeling Use cases From text to tokens – the NLP pipeline NLP pipeline with spaCy and textacy Parsing, tokenizing, and annotating a sentence Batch-processing documents Sentence boundary detection Named entity recognition N-grams spaCy's streaming API Multi-language NLP NLP with TextBlob Stemming Sentiment polarity and subjectivity From tokens to numbers – the document-term matrix The BoW model Measuring the similarity of documents Document-term matrix with sklearn Using CountVectorizer Visualizing vocabulary distribution Finding the most similar documents TfidFTransformer and TfidFVectorizer The effect of smoothing How to summarize news articles using TfidFVectorizer Text Preprocessing - review Text classification and sentiment analysis The Naive Bayes classifier Bayes' theorem refresher The conditional independence assumption News article classification Training and evaluating multinomial Naive Bayes classifier Sentiment analysis Twitter data Multinomial Naive Bayes Comparison with TextBlob sentiment scores Business reviews – the Yelp dataset challenge Benchmark accuracy Multinomial Naive Bayes model One-versus-all logistic regression Combining text and numerical features Multinomial logistic regression Gradient-boosting machine Summary Topic Modeling Learning latent topics: goals and approaches From linear algebra to hierarchical probabilistic models Latent semantic indexing How to implement LSI using sklearn Pros and cons Probabilistic latent semantic analysis How to implement pLSA using sklearn Latent Dirichlet allocation How LDA works The Dirichlet distribution The generative model Reverse-engineering the process How to evaluate LDA topics Perplexity Topic coherence How to implement LDA using sklearn How to visualize LDA results using pyLDAvis How to implement LDA using gensim Topic modeling for earnings calls Data preprocessing Model training and evaluation Running experiments Topic modeling for Yelp business reviews Summary Word Embeddings How word embeddings encode semantics How neural language models learn usage in context The Word2vec model – learn embeddings at scale Model objective – simplifying the softmax Automatic phrase detection How to evaluate embeddings – vector arithmetic and analogies How to use pre-trained word vectors GloVe – global vectors for word representation How to train your own word vector embeddings The Skip-Gram architecture in Keras Noise-contrastive estimation The model components Visualizing embeddings using TensorBoard Word vectors from SEC filings using gensim Preprocessing Automatic phrase detection Model training Model evaluation Performance impact of parameter settings Sentiment analysis with Doc2vec Training Doc2vec on yelp sentiment data Create input data Bonus – Word2vec for translation Summary Next Steps Key takeaways and lessons learned Data is the single most important ingredient Quality control Data integration Domain expertise helps unlock value in data Feature engineering and alpha factor research ML is a toolkit for solving problems with data Model diagnostics help speed up optimization Making do without a free lunch Managing the bias-variance trade-off Define targeted model objectives The optimization verification test Beware of backtest overfitting How to gain insights from black-box models ML for trading in practice Data management technologies Database systems Big Data technologies – Hadoop and Spark ML tools Online trading platforms Quantopian QuantConnect QuantRocket Conclusion Other Books You May Enjoy Leave a review - let other readers know what you think