0%

Book Description

Practical Data Science with R, Second Edition is a task-based tutorial that leads readers through dozens of useful, data analysis practices using the R language. By concentrating on the most important tasks you’ll face on the job, this friendly guide is comfortable both for business analysts and data scientists. Because data is only useful if it can be understood, you’ll also find fantastic tips for organizing and presenting data in tables, as well as snappy visualizations.

Table of Contents

  1. Copyright
  2. Brief Table of Contents
  3. Table of Contents
  4. Praise for the First Edition
  5. Foreword
  6. Preface
  7. Acknowledgments
  8. About This Book
  9. About the Authors
  10. About the Foreword Authors
  11. About the Cover Illustration
  12. Part 1. Introduction to data science
    1. Chapter 1. The data science process
      1. 1.1. The roles in a data science project
      2. 1.2. Stages of a data science project
      3. 1.3. Setting expectations
      4. Summary
    2. Chapter 2. Starting with R and data
      1. 2.1. Starting with R
      2. 2.2. Working with data from files
      3. 2.3. Working with relational databases
      4. Summary
    3. Chapter 3. Exploring data
      1. 3.1. Using summary statistics to spot problems
      2. 3.2. Spotting problems using graphics and visualization
      3. Summary
    4. Chapter 4. Managing data
      1. 4.1. Cleaning data
      2. 4.2. Data transformations
      3. 4.3. Sampling for modeling and validation
      4. Summary
    5. Chapter 5. Data engineering and data shaping
      1. 5.1. Data selection
      2. 5.2. Basic data transforms
      3. 5.3. Aggregating transforms
      4. 5.4. Multitable data transforms
      5. 5.5. Reshaping transforms
      6. Summary
  13. Part 2. Modeling methods
    1. Chapter 6. Choosing and evaluating models
      1. 6.1. Mapping problems to machine learning tasks
      2. 6.2. Evaluating models
      3. 6.3. Local interpretable model-agnostic explanations (LIME) for explai- ining model predictions
      4. Summary
    2. Chapter 7. Linear and logistic regression
      1. 7.1. Using linear regression
      2. 7.2. Using logistic regression
      3. 7.3. Regularization
      4. Summary
    3. Chapter 8. Advanced data preparation
      1. 8.1. The purpose of the vtreat package
      2. 8.2. KDD and KDD Cup 2009
      3. 8.3. Basic data preparation for classification
      4. 8.4. Advanced data preparation for classification
      5. 8.5. Preparing data for regression modeling
      6. 8.6. Mastering the vtreat package
      7. Summary
    4. Chapter 9. Unsupervised methods
      1. 9.1. Cluster analysis
      2. 9.2. Association rules
      3. Summary
    5. Chapter 10. Exploring advanced methods
      1. 10.1. Tree-based methods
      2. 10.2. Using generalized additive models (GAMs) to learn non-monotone relationships
      3. 10.3. Solving “inseparable” problems using support vector machines
      4. Summary
  14. Part 3. Working in the real world
    1. Chapter 11. Documentation and deployment
      1. 11.1. Predicting buzz
      2. 11.2. Using R markdown to produce milestone documentation
      3. 11.3. Using comments and version control for running documentation
      4. 11.4. Deploying models
      5. Summary
    2. Chapter 12. Producing effective presentations
      1. 12.1. Presenting your results to the project sponsor
      2. 12.2. Presenting your model to end users
      3. 12.3. Presenting your work to other data scientists
      4. Summary
  15. Appendix A. Starting with R and other tools
    1. A.1. Installing the tools
    2. A.2. Starting with R
    3. A.3. Using databases with R
    4. A.4. The takeaway
  16. Appendix B. Important statistical concepts
    1. B.1. Distributions
    2. B.2. Statistical theory
    3. B.3. Examples of the statistical view of data
    4. B.4. The takeaway
  17. Appendix C. Bibliography
  18. Practical Data Science with R
  19. Index
  20. List of Figures
  21. List of Tables
  22. List of Listings