Introduction

The power of big data and data science are revolutionizing the world. From the modern business enterprise to the lifestyle choices of today’s digital citizen, data science insights are driving changes and improvements in every arena. Although data science may be a new topic to many, it’s a skill that any individual who wants to stay relevant in her career field and industry needs to know.

This book is a reference manual to guide you through the vast and expansive areas encompassed by big data and data science. If you’re looking to learn a little about a lot of what’s happening across the entire space, this book is for you. If you’re an organizational manager who seeks to understand how data science and big data implementations could improve your business, this book is for you. If you’re a technical analyst, or even a developer, who wants a reference book for a quick catch-up on how machine learning and programming methods work in the data science space, this book is for you.

But, if you are looking for hands-on training in deep and very specific areas that are involved in actually implementing data science and big data initiatives, this is not the book for you. Look elsewhere because this book focuses on providing a brief and broad primer on all the areas encompassed by data science and big data. To keep the book at the For Dummies level, I do not go too deeply or specifically into any one area. Plenty of online courses are available to support people who want to spend the time and energy exploring these narrow crevices. I suggest that people follow up this book by taking courses in areas that are of specific interest to them.

Although other books dealing with data science tend to focus heavily on using Microsoft Excel to learn basic data science techniques, Data Science For Dummies goes deeper by introducing the R statistical programming language, Python, D3.js, SQL, Excel, and a whole plethora of open-source applications that you can use to get started in practicing data science. Some books on data science are needlessly wordy, with their authors going in circles trying to get to the point. Not so here. Unlike books authored by stuffy-toned, academic types, I’ve written this book in friendly, approachable language — because data science is a friendly and approachable subject!

To be honest, until now, the data science realm has been dominated by a few select data science wizards who tend to present the topic in a manner that’s unnecessarily overly technical and intimidating. Basic data science isn’t that confusing or difficult to understand. Data science is simply the practice of using a set of analytical techniques and methodologies to derive and communicate valuable and actionable insights from raw data. The purpose of data science is to optimize processes and to support improved data-informed decision making, thereby generating an increase in value — whether value is represented by number of lives saved, number of dollars retained, or percentage of revenues increased. In Data Science For Dummies, I introduce a broad array of concepts and approaches that you can use when extracting valuable insights from your data.

Many times, data scientists get so caught up analyzing the bark of the trees that they simply forget to look for their way out of the forest. This common pitfall is one that you should avoid at all costs. I’ve worked hard to make sure that this book presents the core purpose of each data science technique and the goals you can accomplish by utilizing them.

About This Book

In keeping with the For Dummies brand, this book is organized in a modular, easy-to-access format that allows you to use the book as a practical guidebook and ad hoc reference. In other words, you don’t need to read it through, from cover to cover. Just take what you want and leave the rest. I’ve taken great care to use real-world examples that illustrate data science concepts that may otherwise be overly abstract.

Web addresses and programming code appear in monofont. If you’re reading a digital version of this book on a device connected to the Internet, you can click a web address to visit that website, like this: www.dummies.com.

Foolish Assumptions

In writing this book, I’ve assumed that readers are at least technically minded enough to have mastered advanced tasks in Microsoft Excel — pivot tables, grouping, sorting, plotting, and the like. Having strong skills in algebra, basic statistics, or even business calculus helps as well. Foolish or not, it’s my high hope that all readers have a subject-matter expertise to which they can apply the skills presented in this book. Because data scientists must be capable of intuitively understanding the implications and applications of the data insights they derive, subject-matter expertise is a major component of data science.

Icons Used in This Book

As you make your way through this book, you’ll see the following icons in the margins:

tip The Tip icon marks tips (duh!) and shortcuts that you can use to make subject mastery easier.

remember Remember icons mark the information that’s especially important to know. To siphon off the most important information in each chapter, just skim the material represented by these icons.

technicalstuff The Technical Stuff icon marks information of a highly technical nature that you can normally skip.

warning The Warning icon tells you to watch out! It marks important information that may save you headaches.

Beyond the Book

This book includes the following external resources:

  • Data Science Cheat Sheet: This book comes with a handy Cheat Sheet which lists helpful shortcuts as well as abbreviated definitions for essential processes and concepts described in the book. You can use it as a quick-and-easy reference when doing data science. To get this Cheat Sheet, simply go to www.dummies.com and search for Data Science Cheat Sheet in the Search box.
  • Data Science Tutorial Datasets: This book has a few tutorials that rely on external datasets. You can download all datasets for these tutorials from the GitHub repository for this course at https://github.com/BigDataGal/Data-Science-for-Dummies.

Where to Go from Here

Just to reemphasize the point, this book’s modular design allows you to pick up and start reading anywhere you want. Although you don’t need to read from cover to cover, a few good starter chapters are Chapters 1, 2, and 9.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset