Home Page Icon
Home Page
Table of Contents for
Table of Contents
Close
Table of Contents
by Luiz Felipe Martins, Magnus Vilhelm Persson, Ivan Idris, Martin Czygan, Phuong V
Python: End-to-end Data Analysis
Python: End-to-end Data Analysis
Table of Contents
Python: End-to-end Data Analysis
Python: End-to-end Data Analysis
Credits
Preface
What this learning path covers
What you need for this learning path
Who this learning path is for
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Module 1
1. Introducing Data Analysis and Libraries
Data analysis and processing
An overview of the libraries in data analysis
Python libraries in data analysis
NumPy
Pandas
Matplotlib
PyMongo
The scikit-learn library
Summary
2. NumPy Arrays and Vectorized Computation
NumPy arrays
Data types
Array creation
Indexing and slicing
Fancy indexing
Numerical operations on arrays
Array functions
Data processing using arrays
Loading and saving data
Saving an array
Loading an array
Linear algebra with NumPy
NumPy random numbers
Summary
3. Data Analysis with Pandas
An overview of the Pandas package
The Pandas data structure
Series
The DataFrame
The essential basic functionality
Reindexing and altering labels
Head and tail
Binary operations
Functional statistics
Function application
Sorting
Indexing and selecting data
Computational tools
Working with missing data
Advanced uses of Pandas for data analysis
Hierarchical indexing
The Panel data
Summary
4. Data Visualization
The matplotlib API primer
Line properties
Figures and subplots
Exploring plot types
Scatter plots
Bar plots
Contour plots
Histogram plots
Legends and annotations
Plotting functions with Pandas
Additional Python data visualization tools
Bokeh
MayaVi
Summary
5. Time Series
Time series primer
Working with date and time objects
Resampling time series
Downsampling time series data
Upsampling time series data
Time zone handling
Timedeltas
Time series plotting
Summary
6. Interacting with Databases
Interacting with data in text format
Reading data from text format
Writing data to text format
Interacting with data in binary format
HDF5
Interacting with data in MongoDB
Interacting with data in Redis
The simple value
List
Set
Ordered set
Summary
7. Data Analysis Application Examples
Data munging
Cleaning data
Filtering
Merging data
Reshaping data
Data aggregation
Grouping data
Summary
8. Machine Learning Models with scikit-learn
An overview of machine learning models
The scikit-learn modules for different models
Data representation in scikit-learn
Supervised learning – classification and regression
Unsupervised learning – clustering and dimensionality reduction
Measuring prediction performance
Summary
2. Module 2
1. Laying the Foundation for Reproducible Data Analysis
Introduction
Setting up Anaconda
Getting ready
How to do it...
There's more...
See also
Installing the Data Science Toolbox
Getting ready
How to do it...
How it works...
See also
Creating a virtual environment with virtualenv and virtualenvwrapper
Getting ready
How to do it...
See also
Sandboxing Python applications with Docker images
Getting ready
How to do it...
How it works...
See also
Keeping track of package versions and history in IPython Notebook
Getting ready
How to do it...
How it works...
See also
Configuring IPython
Getting ready
How to do it...
See also
Learning to log for robust error checking
Getting ready
How to do it...
How it works...
See also
Unit testing your code
Getting ready
How to do it...
How it works...
See also
Configuring pandas
Getting ready
How to do it...
Configuring matplotlib
Getting ready
How to do it...
How it works...
See also
Seeding random number generators and NumPy print options
Getting ready
How to do it...
See also
Standardizing reports, code style, and data access
Getting ready
How to do it...
See also
2. Creating Attractive Data Visualizations
Introduction
Graphing Anscombe's quartet
How to do it...
See also
Choosing seaborn color palettes
How to do it...
See also
Choosing matplotlib color maps
How to do it...
See also
Interacting with IPython Notebook widgets
How to do it...
See also
Viewing a matrix of scatterplots
How to do it...
Visualizing with d3.js via mpld3
Getting ready
How to do it...
Creating heatmaps
Getting ready
How to do it...
See also
Combining box plots and kernel density plots with violin plots
How to do it...
See also
Visualizing network graphs with hive plots
Getting ready
How to do it...
Displaying geographical maps
Getting ready
How to do it...
Using ggplot2-like plots
Getting ready
How to do it...
Highlighting data points with influence plots
How to do it...
See also
3. Statistical Data Analysis and Probability
Introduction
Fitting data to the exponential distribution
How to do it...
How it works…
See also
Fitting aggregated data to the gamma distribution
How to do it...
See also
Fitting aggregated counts to the Poisson distribution
How to do it...
See also
Determining bias
How to do it...
See also
Estimating kernel density
How to do it...
See also
Determining confidence intervals for mean, variance, and standard deviation
How to do it...
See also
Sampling with probability weights
How to do it...
See also
Exploring extreme values
How to do it...
See also
Correlating variables with Pearson's correlation
How to do it...
See also
Correlating variables with the Spearman rank correlation
How to do it...
See also
Correlating a binary and a continuous variable with the point biserial correlation
How to do it...
See also
Evaluating relations between variables with ANOVA
How to do it...
See also
4. Dealing with Data and Numerical Issues
Introduction
Clipping and filtering outliers
How to do it...
See also
Winsorizing data
How to do it...
See also
Measuring central tendency of noisy data
How to do it...
See also
Normalizing with the Box-Cox transformation
How to do it...
How it works
See also
Transforming data with the power ladder
How to do it...
Transforming data with logarithms
How to do it...
Rebinning data
How to do it...
Applying logit() to transform proportions
How to do it...
Fitting a robust linear model
How to do it...
See also
Taking variance into account with weighted least squares
How to do it...
See also
Using arbitrary precision for optimization
Getting ready
How to do it...
See also
Using arbitrary precision for linear algebra
Getting ready
How to do it...
See also
5. Web Mining, Databases, and Big Data
Introduction
Simulating web browsing
Getting ready
How to do it…
See also
Scraping the Web
Getting ready
How to do it…
Dealing with non-ASCII text and HTML entities
Getting ready
How to do it…
See also
Implementing association tables
Getting ready
How to do it…
Setting up database migration scripts
Getting ready
How to do it…
See also
Adding a table column to an existing table
Getting ready
How to do it…
Adding indices after table creation
Getting ready
How to do it…
How it works…
See also
Setting up a test web server
Getting ready
How to do it…
Implementing a star schema with fact and dimension tables
How to do it…
See also
Using HDFS
Getting ready
How to do it…
See also
Setting up Spark
Getting ready
How to do it…
See also
Clustering data with Spark
Getting ready
How to do it…
How it works…
There's more…
See also
6. Signal Processing and Timeseries
Introduction
Spectral analysis with periodograms
How to do it...
See also
Estimating power spectral density with the Welch method
How to do it...
See also
Analyzing peaks
How to do it...
See also
Measuring phase synchronization
How to do it...
See also
Exponential smoothing
How to do it...
See also
Evaluating smoothing
How to do it...
See also
Using the Lomb-Scargle periodogram
How to do it...
See also
Analyzing the frequency spectrum of audio
How to do it...
See also
Analyzing signals with the discrete cosine transform
How to do it...
See also
Block bootstrapping time series data
How to do it...
See also
Moving block bootstrapping time series data
How to do it...
See also
Applying the discrete wavelet transform
Getting started
How to do it...
See also
7. Selecting Stocks with Financial Data Analysis
Introduction
Computing simple and log returns
How to do it...
See also
Ranking stocks with the Sharpe ratio and liquidity
How to do it...
See also
Ranking stocks with the Calmar and Sortino ratios
How to do it...
See also
Analyzing returns statistics
How to do it...
Correlating individual stocks with the broader market
How to do it...
Exploring risk and return
How to do it...
See also
Examining the market with the non-parametric runs test
How to do it...
See also
Testing for random walks
How to do it...
See also
Determining market efficiency with autoregressive models
How to do it...
See also
Creating tables for a stock prices database
How to do it...
Populating the stock prices database
How to do it...
Optimizing an equal weights two-asset portfolio
How to do it...
See also
8. Text Mining and Social Network Analysis
Introduction
Creating a categorized corpus
Getting ready
How to do it...
See also
Tokenizing news articles in sentences and words
Getting ready
How to do it...
See also
Stemming, lemmatizing, filtering, and TF-IDF scores
Getting ready
How to do it...
How it works
See also
Recognizing named entities
Getting ready
How to do it...
How it works
See also
Extracting topics with non-negative matrix factorization
How to do it...
How it works
See also
Implementing a basic terms database
How to do it...
How it works
See also
Computing social network density
Getting ready
How to do it...
See also
Calculating social network closeness centrality
Getting ready
How to do it...
See also
Determining the betweenness centrality
Getting ready
How to do it...
See also
Estimating the average clustering coefficient
Getting ready
How to do it...
See also
Calculating the assortativity coefficient of a graph
Getting ready
How to do it...
See also
Getting the clique number of a graph
Getting ready
How to do it...
See also
Creating a document graph with cosine similarity
How to do it...
See also
9. Ensemble Learning and Dimensionality Reduction
Introduction
Recursively eliminating features
How to do it...
How it works
See also
Applying principal component analysis for dimension reduction
How to do it...
See also
Applying linear discriminant analysis for dimension reduction
How to do it...
See also
Stacking and majority voting for multiple models
How to do it...
See also
Learning with random forests
How to do it...
There's more…
See also
Fitting noisy data with the RANSAC algorithm
How to do it...
See also
Bagging to improve results
How to do it...
See also
Boosting for better learning
How to do it...
See also
Nesting cross-validation
How to do it...
See also
Reusing models with joblib
How to do it...
See also
Hierarchically clustering data
How to do it...
See also
Taking a Theano tour
Getting ready
How to do it...
See also
10. Evaluating Classifiers, Regressors, and Clusters
Introduction
Getting classification straight with the confusion matrix
How to do it...
How it works
See also
Computing precision, recall, and F1-score
How to do it...
See also
Examining a receiver operating characteristic and the area under a curve
How to do it...
See also
Visualizing the goodness of fit
How to do it...
See also
Computing MSE and median absolute error
How to do it...
See also
Evaluating clusters with the mean silhouette coefficient
How to do it...
See also
Comparing results with a dummy classifier
How to do it...
See also
Determining MAPE and MPE
How to do it...
See also
Comparing with a dummy regressor
How to do it...
See also
Calculating the mean absolute error and the residual sum of squares
How to do it...
See also
Examining the kappa of classification
How to do it...
How it works
See also
Taking a look at the Matthews correlation coefficient
How to do it...
See also
11. Analyzing Images
Introduction
Setting up OpenCV
Getting ready
How to do it...
How it works
There's more
Applying Scale-Invariant Feature Transform (SIFT)
Getting ready
How to do it...
See also
Detecting features with SURF
Getting ready
How to do it...
See also
Quantizing colors
Getting ready
How to do it...
See also
Denoising images
Getting ready
How to do it...
See also
Extracting patches from an image
Getting ready
How to do it...
See also
Detecting faces with Haar cascades
Getting ready
How to do it...
See also
Searching for bright stars
Getting ready
How to do it...
See also
Extracting metadata from images
Getting ready
How to do it...
See also
Extracting texture features from images
Getting ready
How to do it...
See also
Applying hierarchical clustering on images
How to do it...
See also
Segmenting images with spectral clustering
How to do it...
See also
12. Parallelism and Performance
Introduction
Just-in-time compiling with Numba
Getting ready
How to do it...
How it works
See also
Speeding up numerical expressions with Numexpr
How to do it...
How it works
See also
Running multiple threads with the threading module
How to do it...
See also
Launching multiple tasks with the concurrent.futures module
How to do it...
See also
Accessing resources asynchronously with the asyncio module
How to do it...
See also
Distributed processing with execnet
Getting ready
How to do it...
See also
Profiling memory usage
Getting ready
How to do it...
See also
Calculating the mean, variance, skewness, and kurtosis on the fly
Getting ready
How to do it...
See also
Caching with a least recently used cache
Getting ready
How to do it...
See also
Caching HTTP requests
Getting ready
How to do it...
See also
Streaming counting with the Count-min sketch
How to do it...
See also
Harnessing the power of the GPU with OpenCL
Getting ready
How to do it...
See also
A. Glossary
B. Function Reference
IPython
Matplotlib
NumPy
pandas
Scikit-learn
SciPy
Seaborn
Statsmodels
C. Online Resources
IPython notebooks and open data
Mathematics and statistics
Presentations
D. Tips and Tricks for Command-Line and Miscellaneous Tools
IPython notebooks
Command-line tools
The alias command
Command-line history
Reproducible sessions
Docker tips
3. Module 3
1. Tools of the Trade
Before you start
Using the notebook interface
Imports
An example using the Pandas library
Summary
2. Exploring Data
The General Social Survey
Obtaining the data
Reading the data
Univariate data
Histograms
Making things pretty
Characterization
Concept of statistical inference
Numeric summaries and boxplots
Relationships between variables – scatterplots
Summary
3. Learning About Models
Models and experiments
The cumulative distribution function
Working with distributions
The probability density function
Where do models come from?
Multivariate distributions
Summary
4. Regression
Introducing linear regression
Getting the dataset
Testing with linear regression
Multivariate regression
Adding economic indicators
Taking a step back
Logistic regression
Some notes
Summary
5. Clustering
Introduction to cluster finding
Starting out simple – John Snow on cholera
K-means clustering
Suicide rate versus GDP versus absolute latitude
Hierarchical clustering analysis
Reading in and reducing the data
Hierarchical cluster algorithm
Summary
6. Bayesian Methods
The Bayesian method
Credible versus confidence intervals
Bayes formula
Python packages
U.S. air travel safety record
Getting the NTSB database
Binning the data
Bayesian analysis of the data
Binning by month
Plotting coordinates
Cartopy
Mpl toolkits – basemap
Climate change - CO2 in the atmosphere
Getting the data
Creating and sampling the model
Summary
7. Supervised and Unsupervised Learning
Introduction to machine learning
Scikit-learn
Linear regression
Climate data
Checking with Bayesian analysis and OLS
Clustering
Seeds classification
Visualizing the data
Feature selection
Classifying the data
The SVC linear kernel
The SVC Radial Basis Function
The SVC polynomial
K-Nearest Neighbour
Random Forest
Choosing your classifier
Summary
8. Time Series Analysis
Introduction
Pandas and time series data
Indexing and slicing
Resampling, smoothing, and other estimates
Stationarity
Patterns and components
Decomposing components
Differencing
Time series models
Autoregressive – AR
Moving average – MA
Selecting p and q
Automatic function
The (Partial) AutoCorrelation Function
Autoregressive Integrated Moving Average – ARIMA
Summary
E. More on Jupyter Notebook and matplotlib Styles
Jupyter Notebook
Useful keyboard shortcuts
Command mode shortcuts
Edit mode shortcuts
Markdown cells
Notebook Python extensions
Installing the extensions
Codefolding
Collapsible headings
Help panel
Initialization cells
NbExtensions menu item
Ruler
Skip-traceback
Table of contents
Other Jupyter Notebook tips
External connections
Export
Additional file types
Matplotlib styles
Useful resources
General resources
Packages
Data repositories
Visualization of data
Summary
A. Bibliography
Index
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Prev
Previous Chapter
Cover
Next
Next Chapter
Python: End-to-end Data Analysis
Table of Contents
Python: End-to-end Data Analysis
Python: End-to-end Data Analysis
Credits
Preface
What this learning path covers
What you need for this learning path
Who this learning path is for
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Module 1
1. Introducing Data Analysis and Libraries
Data analysis and processing
An overview of the libraries in data analysis
Python libraries in data analysis
NumPy
Pandas
Matplotlib
PyMongo
The scikit-learn library
Summary
2. NumPy Arrays and Vectorized Computation
NumPy arrays
Data types
Array creation
Indexing and slicing
Fancy indexing
Numerical operations on arrays
Array functions
Data processing using arrays
Loading and saving data
Saving an array
Loading an array
Linear algebra with NumPy
NumPy random numbers
Summary
3. Data Analysis with Pandas
An overview of the Pandas package
The Pandas data structure
Series
The DataFrame
The essential basic functionality
Reindexing and altering labels
Head and tail
Binary operations
Functional statistics
Function application
Sorting
Indexing and selecting data
Computational tools
Working with missing data
Advanced uses of Pandas for data analysis
Hierarchical indexing
The Panel data
Summary
4. Data Visualization
The matplotlib API primer
Line properties
Figures and subplots
Exploring plot types
Scatter plots
Bar plots
Contour plots
Histogram plots
Legends and annotations
Plotting functions with Pandas
Additional Python data visualization tools
Bokeh
MayaVi
Summary
5. Time Series
Time series primer
Working with date and time objects
Resampling time series
Downsampling time series data
Upsampling time series data
Time zone handling
Timedeltas
Time series plotting
Summary
6. Interacting with Databases
Interacting with data in text format
Reading data from text format
Writing data to text format
Interacting with data in binary format
HDF5
Interacting with data in MongoDB
Interacting with data in Redis
The simple value
List
Set
Ordered set
Summary
7. Data Analysis Application Examples
Data munging
Cleaning data
Filtering
Merging data
Reshaping data
Data aggregation
Grouping data
Summary
8. Machine Learning Models with scikit-learn
An overview of machine learning models
The scikit-learn modules for different models
Data representation in scikit-learn
Supervised learning – classification and regression
Unsupervised learning – clustering and dimensionality reduction
Measuring prediction performance
Summary
2. Module 2
1. Laying the Foundation for Reproducible Data Analysis
Introduction
Setting up Anaconda
Getting ready
How to do it...
There's more...
See also
Installing the Data Science Toolbox
Getting ready
How to do it...
How it works...
See also
Creating a virtual environment with virtualenv and virtualenvwrapper
Getting ready
How to do it...
See also
Sandboxing Python applications with Docker images
Getting ready
How to do it...
How it works...
See also
Keeping track of package versions and history in IPython Notebook
Getting ready
How to do it...
How it works...
See also
Configuring IPython
Getting ready
How to do it...
See also
Learning to log for robust error checking
Getting ready
How to do it...
How it works...
See also
Unit testing your code
Getting ready
How to do it...
How it works...
See also
Configuring pandas
Getting ready
How to do it...
Configuring matplotlib
Getting ready
How to do it...
How it works...
See also
Seeding random number generators and NumPy print options
Getting ready
How to do it...
See also
Standardizing reports, code style, and data access
Getting ready
How to do it...
See also
2. Creating Attractive Data Visualizations
Introduction
Graphing Anscombe's quartet
How to do it...
See also
Choosing seaborn color palettes
How to do it...
See also
Choosing matplotlib color maps
How to do it...
See also
Interacting with IPython Notebook widgets
How to do it...
See also
Viewing a matrix of scatterplots
How to do it...
Visualizing with d3.js via mpld3
Getting ready
How to do it...
Creating heatmaps
Getting ready
How to do it...
See also
Combining box plots and kernel density plots with violin plots
How to do it...
See also
Visualizing network graphs with hive plots
Getting ready
How to do it...
Displaying geographical maps
Getting ready
How to do it...
Using ggplot2-like plots
Getting ready
How to do it...
Highlighting data points with influence plots
How to do it...
See also
3. Statistical Data Analysis and Probability
Introduction
Fitting data to the exponential distribution
How to do it...
How it works…
See also
Fitting aggregated data to the gamma distribution
How to do it...
See also
Fitting aggregated counts to the Poisson distribution
How to do it...
See also
Determining bias
How to do it...
See also
Estimating kernel density
How to do it...
See also
Determining confidence intervals for mean, variance, and standard deviation
How to do it...
See also
Sampling with probability weights
How to do it...
See also
Exploring extreme values
How to do it...
See also
Correlating variables with Pearson's correlation
How to do it...
See also
Correlating variables with the Spearman rank correlation
How to do it...
See also
Correlating a binary and a continuous variable with the point biserial correlation
How to do it...
See also
Evaluating relations between variables with ANOVA
How to do it...
See also
4. Dealing with Data and Numerical Issues
Introduction
Clipping and filtering outliers
How to do it...
See also
Winsorizing data
How to do it...
See also
Measuring central tendency of noisy data
How to do it...
See also
Normalizing with the Box-Cox transformation
How to do it...
How it works
See also
Transforming data with the power ladder
How to do it...
Transforming data with logarithms
How to do it...
Rebinning data
How to do it...
Applying logit() to transform proportions
How to do it...
Fitting a robust linear model
How to do it...
See also
Taking variance into account with weighted least squares
How to do it...
See also
Using arbitrary precision for optimization
Getting ready
How to do it...
See also
Using arbitrary precision for linear algebra
Getting ready
How to do it...
See also
5. Web Mining, Databases, and Big Data
Introduction
Simulating web browsing
Getting ready
How to do it…
See also
Scraping the Web
Getting ready
How to do it…
Dealing with non-ASCII text and HTML entities
Getting ready
How to do it…
See also
Implementing association tables
Getting ready
How to do it…
Setting up database migration scripts
Getting ready
How to do it…
See also
Adding a table column to an existing table
Getting ready
How to do it…
Adding indices after table creation
Getting ready
How to do it…
How it works…
See also
Setting up a test web server
Getting ready
How to do it…
Implementing a star schema with fact and dimension tables
How to do it…
See also
Using HDFS
Getting ready
How to do it…
See also
Setting up Spark
Getting ready
How to do it…
See also
Clustering data with Spark
Getting ready
How to do it…
How it works…
There's more…
See also
6. Signal Processing and Timeseries
Introduction
Spectral analysis with periodograms
How to do it...
See also
Estimating power spectral density with the Welch method
How to do it...
See also
Analyzing peaks
How to do it...
See also
Measuring phase synchronization
How to do it...
See also
Exponential smoothing
How to do it...
See also
Evaluating smoothing
How to do it...
See also
Using the Lomb-Scargle periodogram
How to do it...
See also
Analyzing the frequency spectrum of audio
How to do it...
See also
Analyzing signals with the discrete cosine transform
How to do it...
See also
Block bootstrapping time series data
How to do it...
See also
Moving block bootstrapping time series data
How to do it...
See also
Applying the discrete wavelet transform
Getting started
How to do it...
See also
7. Selecting Stocks with Financial Data Analysis
Introduction
Computing simple and log returns
How to do it...
See also
Ranking stocks with the Sharpe ratio and liquidity
How to do it...
See also
Ranking stocks with the Calmar and Sortino ratios
How to do it...
See also
Analyzing returns statistics
How to do it...
Correlating individual stocks with the broader market
How to do it...
Exploring risk and return
How to do it...
See also
Examining the market with the non-parametric runs test
How to do it...
See also
Testing for random walks
How to do it...
See also
Determining market efficiency with autoregressive models
How to do it...
See also
Creating tables for a stock prices database
How to do it...
Populating the stock prices database
How to do it...
Optimizing an equal weights two-asset portfolio
How to do it...
See also
8. Text Mining and Social Network Analysis
Introduction
Creating a categorized corpus
Getting ready
How to do it...
See also
Tokenizing news articles in sentences and words
Getting ready
How to do it...
See also
Stemming, lemmatizing, filtering, and TF-IDF scores
Getting ready
How to do it...
How it works
See also
Recognizing named entities
Getting ready
How to do it...
How it works
See also
Extracting topics with non-negative matrix factorization
How to do it...
How it works
See also
Implementing a basic terms database
How to do it...
How it works
See also
Computing social network density
Getting ready
How to do it...
See also
Calculating social network closeness centrality
Getting ready
How to do it...
See also
Determining the betweenness centrality
Getting ready
How to do it...
See also
Estimating the average clustering coefficient
Getting ready
How to do it...
See also
Calculating the assortativity coefficient of a graph
Getting ready
How to do it...
See also
Getting the clique number of a graph
Getting ready
How to do it...
See also
Creating a document graph with cosine similarity
How to do it...
See also
9. Ensemble Learning and Dimensionality Reduction
Introduction
Recursively eliminating features
How to do it...
How it works
See also
Applying principal component analysis for dimension reduction
How to do it...
See also
Applying linear discriminant analysis for dimension reduction
How to do it...
See also
Stacking and majority voting for multiple models
How to do it...
See also
Learning with random forests
How to do it...
There's more…
See also
Fitting noisy data with the RANSAC algorithm
How to do it...
See also
Bagging to improve results
How to do it...
See also
Boosting for better learning
How to do it...
See also
Nesting cross-validation
How to do it...
See also
Reusing models with joblib
How to do it...
See also
Hierarchically clustering data
How to do it...
See also
Taking a Theano tour
Getting ready
How to do it...
See also
10. Evaluating Classifiers, Regressors, and Clusters
Introduction
Getting classification straight with the confusion matrix
How to do it...
How it works
See also
Computing precision, recall, and F1-score
How to do it...
See also
Examining a receiver operating characteristic and the area under a curve
How to do it...
See also
Visualizing the goodness of fit
How to do it...
See also
Computing MSE and median absolute error
How to do it...
See also
Evaluating clusters with the mean silhouette coefficient
How to do it...
See also
Comparing results with a dummy classifier
How to do it...
See also
Determining MAPE and MPE
How to do it...
See also
Comparing with a dummy regressor
How to do it...
See also
Calculating the mean absolute error and the residual sum of squares
How to do it...
See also
Examining the kappa of classification
How to do it...
How it works
See also
Taking a look at the Matthews correlation coefficient
How to do it...
See also
11. Analyzing Images
Introduction
Setting up OpenCV
Getting ready
How to do it...
How it works
There's more
Applying Scale-Invariant Feature Transform (SIFT)
Getting ready
How to do it...
See also
Detecting features with SURF
Getting ready
How to do it...
See also
Quantizing colors
Getting ready
How to do it...
See also
Denoising images
Getting ready
How to do it...
See also
Extracting patches from an image
Getting ready
How to do it...
See also
Detecting faces with Haar cascades
Getting ready
How to do it...
See also
Searching for bright stars
Getting ready
How to do it...
See also
Extracting metadata from images
Getting ready
How to do it...
See also
Extracting texture features from images
Getting ready
How to do it...
See also
Applying hierarchical clustering on images
How to do it...
See also
Segmenting images with spectral clustering
How to do it...
See also
12. Parallelism and Performance
Introduction
Just-in-time compiling with Numba
Getting ready
How to do it...
How it works
See also
Speeding up numerical expressions with Numexpr
How to do it...
How it works
See also
Running multiple threads with the threading module
How to do it...
See also
Launching multiple tasks with the concurrent.futures module
How to do it...
See also
Accessing resources asynchronously with the asyncio module
How to do it...
See also
Distributed processing with execnet
Getting ready
How to do it...
See also
Profiling memory usage
Getting ready
How to do it...
See also
Calculating the mean, variance, skewness, and kurtosis on the fly
Getting ready
How to do it...
See also
Caching with a least recently used cache
Getting ready
How to do it...
See also
Caching HTTP requests
Getting ready
How to do it...
See also
Streaming counting with the Count-min sketch
How to do it...
See also
Harnessing the power of the GPU with OpenCL
Getting ready
How to do it...
See also
A. Glossary
B. Function Reference
IPython
Matplotlib
NumPy
pandas
Scikit-learn
SciPy
Seaborn
Statsmodels
C. Online Resources
IPython notebooks and open data
Mathematics and statistics
Presentations
D. Tips and Tricks for Command-Line and Miscellaneous Tools
IPython notebooks
Command-line tools
The alias command
Command-line history
Reproducible sessions
Docker tips
3. Module 3
1. Tools of the Trade
Before you start
Using the notebook interface
Imports
An example using the Pandas library
Summary
2. Exploring Data
The General Social Survey
Obtaining the data
Reading the data
Univariate data
Histograms
Making things pretty
Characterization
Concept of statistical inference
Numeric summaries and boxplots
Relationships between variables – scatterplots
Summary
3. Learning About Models
Models and experiments
The cumulative distribution function
Working with distributions
The probability density function
Where do models come from?
Multivariate distributions
Summary
4. Regression
Introducing linear regression
Getting the dataset
Testing with linear regression
Multivariate regression
Adding economic indicators
Taking a step back
Logistic regression
Some notes
Summary
5. Clustering
Introduction to cluster finding
Starting out simple – John Snow on cholera
K-means clustering
Suicide rate versus GDP versus absolute latitude
Hierarchical clustering analysis
Reading in and reducing the data
Hierarchical cluster algorithm
Summary
6. Bayesian Methods
The Bayesian method
Credible versus confidence intervals
Bayes formula
Python packages
U.S. air travel safety record
Getting the NTSB database
Binning the data
Bayesian analysis of the data
Binning by month
Plotting coordinates
Cartopy
Mpl toolkits – basemap
Climate change - CO2 in the atmosphere
Getting the data
Creating and sampling the model
Summary
7. Supervised and Unsupervised Learning
Introduction to machine learning
Scikit-learn
Linear regression
Climate data
Checking with Bayesian analysis and OLS
Clustering
Seeds classification
Visualizing the data
Feature selection
Classifying the data
The SVC linear kernel
The SVC Radial Basis Function
The SVC polynomial
K-Nearest Neighbour
Random Forest
Choosing your classifier
Summary
8. Time Series Analysis
Introduction
Pandas and time series data
Indexing and slicing
Resampling, smoothing, and other estimates
Stationarity
Patterns and components
Decomposing components
Differencing
Time series models
Autoregressive – AR
Moving average – MA
Selecting p and q
Automatic function
The (Partial) AutoCorrelation Function
Autoregressive Integrated Moving Average – ARIMA
Summary
E. More on Jupyter Notebook and matplotlib Styles
Jupyter Notebook
Useful keyboard shortcuts
Command mode shortcuts
Edit mode shortcuts
Markdown cells
Notebook Python extensions
Installing the extensions
Codefolding
Collapsible headings
Help panel
Initialization cells
NbExtensions menu item
Ruler
Skip-traceback
Table of contents
Other Jupyter Notebook tips
External connections
Export
Additional file types
Matplotlib styles
Useful resources
General resources
Packages
Data repositories
Visualization of data
Summary
A. Bibliography
Index
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset