Home Page Icon
Home Page
Table of Contents for
Learning Haskell Data Analysis
Close
Learning Haskell Data Analysis
by James Church
Learning Haskell Data Analysis
Learning Haskell Data Analysis
Table of Contents
Learning Haskell Data Analysis
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
1. Tools of the Trade
Welcome to Haskell and data analysis!
Why Haskell?
Getting ready
Installing the Haskell platform on Linux
The software used in addition to Haskell
SQLite3
Gnuplot
LAPACK
Nearly essential tools of the trade
Version control software – Git
Tmux
Our first Haskell program
Interactive Haskell
An introductory problem
Summary
2. Getting Our Feet Wet
Type is king – the implications of strict types in Haskell
Computing the mean of a list
Computing the sum of a list
Computing the length of a list
Attempting to compute the mean results in an error
Introducing the Fractional class
The fromIntegral and realToFrac functions
Creating our average function
The genericLength function
Metadata is just as important as data
Working with csv files
Preparing our environment
Describing our needs
Crafting our solution
Finding the column index of the specified column
The Maybe and Either monads
Applying a function to a specified column
Converting csv files to the SQLite3 format
Preparing our environment
Describing our needs
Inspecting column information
Crafting our functions
Summary
3. Cleaning Our Datasets
Structured versus unstructured datasets
How data analysis differs from pattern recognition
Creating your own structured data
Counting the number of fields in each record
Filtering data using regular expressions
Creating a simplified version of grep in Haskell
Exhibit A – a horrible customer database
Searching fields based on a regular expression
Locating empty fields in a csv file based on a regular expression
Crafting a regular expression to match dates
Summary
4. Plotting
Plotting data with EasyPlot
Simplifying access to data in SQLite3
Plotting data from a SQLite3 database
Exploring the EasyPlot library
Plotting a subset of a dataset
Plotting data passed through a function
Plotting multiple datasets
Plotting a moving average
Plotting a scatterplot
Summary
5. Hypothesis Testing
Data in a coin
Hypothesis test
Establishing the magic coin test
Understanding data variance
Probability mass function
Determining our test interval
Establishing the parameters of the experiment
Introducing System.Random
Performing the experiment
Does a home-field advantage really exist?
Converting the data to SQLite3
Exploring the data
Plotting what looks interesting
Returning to our test
The standard deviation
The standard error
The confidence interval
An introduction to the Erf module
Using Erf to test the claim
A discussion of the test
Summary
6. Correlation and Regression Analysis
The terminology of correlation and regression
The expectation of a variable
The variance of a variable
Normalizing a variable
The covariance of two variables
Finding the Pearson r correlation coefficient
Finding the Pearson r2 correlation coefficient
Translating what we've learned to Haskell
Study – is there a connection between scoring and winning?
A consideration before we dive in – do any games end in a tie?
Compiling the essential data
Searching for outliers
Plot – runs per game versus the win percentage of each team
Performing correlation analysis
Regression analysis
The regression equation line
Estimating the regression equation
Translate the formulas to Haskell
Returning to the baseball analysis
Plotting the baseball analysis with the regression line
The pitfalls of regression analysis
Summary
7. Naive Bayes Classification of Twitter Data
An introduction to Naive Bayes classification
Prior knowledge
Likelihood
Evidence
Putting the parts of the Bayes theorem together
Creating a Twitter application
Communicating with Twitter
Creating a database to collect tweets
A frequency study of tweets
Cleaning our tweets
Creating our feature vectors
Writing the code for the Bayes theorem
Creating a Naive Bayes classifier with multiple features
Testing our classifier
Summary
8. Building a Recommendation Engine
Analyzing the frequency of words in tweets
A note on the importance of removing stop words
Working with multivariate data
Describing bivariate and multivariate data
Eigenvalues and eigenvectors
The airplane analogy
Preparing our environment
Performing linear algebra in Haskell
Computing the covariance matrix of a dataset
Discovering eigenvalues and eigenvectors in Haskell
Principal Component Analysis in Haskell
Building a recommendation engine
Finding the nearest neighbors
Testing our recommendation engine
Summary
A. Regular Expressions in Haskell
A crash course in regular expressions
The three repetition modifiers
Anchors
The dot
Character classes
Groups
Alternations
A note on regular expressions
Index
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Prev
Previous Chapter
Table of Contents
Next
Next Chapter
Learning Haskell Data Analysis
Learning Haskell Data Analysis
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset