Preface

Data is everywhere, but for most people it’s largely unusable for these key reasons:

  • Some data is stored in databases hidden behind a coding language that the majority of the workforce have never been taught.

  • Some data lurks on individuals’ computers stored away from those who might find it useful.

  • Some data is held in formats that only the developer of the system that created it could read.

So why should you care about this? Frankly, that data holds the answers to questions you have and questions you have yet to even ask. Self-service data preparation is a skill that will change what data analyses you can undertake, reduce the time you need to complete data projects, and fundamentally improve the quality of your analyses.

This book aims to teach you how to access this data and turn it into information to answer those questions using one of the most user-intuitive tools on the market—Tableau Prep Builder. Released in April 2018 to support Tableau Desktop, Server, and Online users, Tableau Prep Builder enables you to turn messy data into a format where it can be analyzed in Tableau’s software. Tableau Desktop, Server, and Online are software platforms that make data easy to explore and analyze visually.

Previously, the largest gap in Tableau’s flow from source system to delivering insightful analysis was manipulating the data into a format that’s easy to use. Tableau, like the majority of business intelligence tools, requires the data to be “clean” and structured into rows and columns. Many analysts used to take on this manual work themselves, hence the need to automate this task and spend that valuable time on the actual analysis instead.

Tableau Prep Builder allows users to easily clean, manipulate, and output data sets that are ready for analysis. Not only that, but Tableau has also embedded a number of its visual analytical approaches within the software so that users can often find the answers to their questions in Prep Builder without having to export the data at all.

Why I Wrote This Book

If everything in Tableau Prep is so intuitive, why do you need to learn how to use it from this book? Simply put, using the tool is only one part of the challenge of preparing data. The other parts consist of:

  • Understanding why to prepare data at all

  • Connecting to all the data you require

  • Understanding how different data types affect the cleaning operations you will need to perform

  • Breaking down the challenge of preparing data to plan your approach

  • Ensuring the proper changes are being made during the cleaning and manipulation operations

  • Combining multiple data sets

  • Determining how and where to output your resulting data

As with all software, it takes a bit of time to learn how to use each function, so this book is filled with screenshots and walkthroughs of the more complicated techniques. A lot of the knowledge shared here will help you undertake your own data preparation projects in any data preparation tool. These techniques will empower you to tackle data sets that previously wouldn’t have been accessible to you. That is why I wrote this book: to empower you to use data, or more data, to improve your decision-making.

In my career, I have been on both sides of the data preparation cycle: as the receiver and as the provider of the outputs. As the receiver, I was often frustrated with the time it took to get my hands on the information I required. The information I did receive was often not in the form I required, or it was missing key pieces of data that became required after I originally put in the data request. As the provider, I always took care to understand the problem, the underlying reason why someone wanted the data, so I could deliver the best solution rather than just what they asked for. I was also conscious that the longer I spent on each request, the longer the queue of others waiting to get their own views of different data sets. This is why I began to teach users how to get to the data themselves. Obviously, it isn’t possible for everyone to spend the time to develop SQL querying skills (don’t worry if you don’t know what this means) in order to access tables they didn’t even understand why they needed yet. Tableau Prep Builder allows you to complete your own preparation with only a few hours of training rather than the days or weeks it would take to get going with SQL.

By reading this book and taking the time to practice the skills it covers, you should feel empowered and equipped to complete your own data preparation and deliver better analytical answers faster than ever before.

Who This Book Is For

This book is for people from different parts of the spectrum that covers working with data, such as those who are:

  • New to data and the workplace. Data is a major part of most jobs now, so if you’re fresh out of school or university, learning the skills this book will cover should prepare you for the future.

  • New to data but an experienced professional. Supplementing your experience with the knowledge you’ll gain from this book can create some amazing results. Without that experience, the data can be meaningless and lack context for you. This book will give you the skills to add data to back up your professional experience.

  • Experienced in visual analytics but not data preparation. Tableau Desktop has empowered many people to conduct their own visual analytics rather than waiting for IT and reporting teams to build reports for them. Tableau Prep Builder is doing exactly the same for data preparation. This book will boost your visual analysis skills and allow you to access data sets that seemed previously impossible.

  • Experienced data prepper. OK, that’s not your official job title, but it’s what you are in my eyes. You might be using Excel, SQL, or another scripting language. Thanks to automation and simplification, Tableau Prep Builder will enable you to work faster than you can with your current methods and tools.

  • Colleagues of the experienced data prepper. Being familiar with Tableau Prep Builder will allow you to take on the simpler, more repeatable tasks of experienced data preppers so they can concentrate on the harder challenges. They will be your oracle on how to develop, so you can help them and yourself at the same time.

How This Book Is Organized

There are seven parts within the book. They have been organized to progressively build the skills and knowledge you require, while also acting as easy reference points when you need to jog your memory. After Chapter 1 looks more deeply at why self-service data preparation is important, chapters are arranged as follows:

Part I (Chapters 26)
After introducing Prep Builder, this part explores how to plan your data preparation and goals for the resulting data set. The final two chapters look at connecting to both data files and databases.
Part II (Chapters 710)
Understanding what data you are using and preparing is key. These chapters help you know what to look for when preparing data and introduce some of the functions you can use to work with data fields.
Part III (Chapters 1118)
Once you have an understanding of your data fields, this part helps you analyze the shape and profile of your data set. You’ll also learn about the transformational steps within Prep Builder.
Part IV (Chapters 1921)
After all your hard work, it’s time to output the data for analysis. This part covers how to output your data from your preparation flow to either a file or a database. This part also covers the other Tableau Prep product, Prep Conductor, which allows you to automate your workflow as well as share your flows with others.
Part V (Chapters 2234)
Getting to this point means that you know the basics of how to produce a simple flow. Data preparation often contains other challenges, however. To help you tackle those, this part introduces you to many more of the cleaning functions built into Prep Builder.
Part VI (Chapters 3541)
Knowing all the relevant techniques is one thing, but knowing when to use them is quite another. Therefore, this part describes how to use the techniques you’ve learned in real-world scenarios and considerations for when you’re faced with more difficult scenarios.
Part VII (Chapters 4249)
This part centers on making your data and flow available to others by managing and documenting the output as well as focusing on the result.

These chapters will give you the knowledge and foundation to prepare your own data for analysis. But like anything in life, practice will hone your skills. To that end, some of the chapters feature data sets, examples, and challenges from Preppin’ Data to allow you to practice the techniques the chapter has covered. Jonathan Allenby and I designed Preppin’ Data as a weekly challenge to allow people with a range of experience to practice their data preparation skills. These exercises are purely optional, but by practicing the technique, it’s much more likely that you’ll understand how to apply it when you next need to. Each exercise explains its intention and requirements, just as if it was a request from someone you know. The Input and Output data sets allow you to try to meet the challenge set in the exercise. Solutions are available on the blog, but there is no right or wrong solution if you have delivered the output requested. Finally, Preppin’ Data often references a company called Chin & Beard Suds Co., which is a mock soap retailer that Jonathan and I use as an example in our exercises. This allows us to use terrible soap-based puns, for which we are unapologetic. The Preppin’ Data site has had 80,000+ hits, 260+ participants, and 2,000+ challenge solutions submitted. We’d love for you to join this community of data preppers.

Acknowledgments

This book would not have been possible without a number of phenomenal people that I get to call peers, colleagues, and friends. First, the one Excel user in my life that I can’t bring into the modern data age, my partner of 15 years, Toni Feather. A lot of her pragmatism exists in these pages, and by writing this, I might finally get her to use different data preparation tools.

A huge thank-you goes to my friends and colleagues at The Information Lab and The Data School in London. Without these brilliant minds and passionate people, this book would never have happened. Four years of consulting experience with the team produced a lot of the use cases you will read about in the pages to come. Tom Brown, Craig Bloodworth, and Robin Kennedy—thank you for making a truly amazing environment to learn and develop in. The Data School consultants deserve special praise too; having the luxury of teaching them over the years has allowed me to refine the “messages” that need to be conveyed, so they have massively shaped the content through the questions they ask every single day. The book began with an idea shared with Dan Farmer (one of the fantastic content editors) who helped form the early skeleton that I then fleshed out. Thank you for helping me shape this thing, Dan.

I became more focused on data preparation when one of the trainee consultants at The Data School, Jonathan Allenby, asked whether there was any way to put into practice the teachings I had just given on Tableau Prep. That prompted the creation of Preppin’ Data, and the success of our blog and the level of demand for instruction on data preparation led to this book.

Those who have shaped the actual content deserve massive praise as they have helped me turn my normal teaching content into this printed form. Angela Rufino at O’Reilly has been a fantastic content editor and made sure everything made sense even to new data preppers. The technical content editors—Jonathan Drummey, Ryan Sleeper, Kimberly Bolch, and Luke Stoughton—have all added a lot to this book. Their feedback has gone beyond just editing and ensured this book will deliver value to everyone reading it.

Finally, thanks to you for reading this book. By adding more data-driven decisions to your personal and work life, you will be improving the world for yourself and those around you. I have the luxury of working with lots of sectors, and the work of those around me inspires me every day. We can make this world a much better place with better use of information and insight—you are now part of those efforts to help others.

Conventions Used in This Book

The following typographical conventions are used in this book:

Italic

Indicates new terms, URLs, email addresses, filenames, and file extensions.

Constant width

Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords.

Constant width bold

Shows commands or other text that should be typed literally by the user.

Constant width italic

Shows text that should be replaced with user-supplied values or by values determined by context.

Tip

This element signifies a tip or suggestion.

Note

This element signifies a general note.

Warning

This element indicates a warning or caution.

Using Code Examples

Supplemental material (code examples, exercises, etc.) is available for download at https://oreil.ly/5k_uH.

If you have a technical question or a problem using the code examples, please send email to .

This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission.

We appreciate, but generally do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “Tableau Prep: Up & Running by Carl Allchin (O’Reilly). Copyright 2020 Carl Allchin, 978-1-492-07962-0.”

If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at .

O’Reilly Online Learning

Note

For more than 40 years, O’Reilly Media has provided technology and business training, knowledge, and insight to help companies succeed.

Our unique network of experts and innovators share their knowledge and expertise through books, articles, and our online learning platform. O’Reilly’s online learning platform gives you on-demand access to live training courses, in-depth learning paths, interactive coding environments, and a vast collection of text and video from O’Reilly and 200+ other publishers. For more information, visit http://oreilly.com.

How to Contact Us

Please address comments and questions concerning this book to the publisher:

  • O’Reilly Media, Inc.
  • 1005 Gravenstein Highway North
  • Sebastopol, CA 95472
  • 800-998-9938 (in the United States or Canada)
  • 707-829-0515 (international or local)
  • 707-829-0104 (fax)

We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at http://oreilly.com/catalog/9781492079613.

Email to comment or ask technical questions about this book.

For news and information about our books and courses, visit http://oreilly.com.

Find us on Facebook: http://facebook.com/oreilly

Follow us on Twitter: http://twitter.com/oreillymedia

Watch us on YouTube: http://www.youtube.com/oreillymedia

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset