Part III. Transform

The second part of the book was a deep dive into data visualization. In this part of the book, you’ll learn about the most important types of variables that you’ll encounter inside a data frame and learn the tools you can use to work with them.

Our data science model, with transform highlighted in blue.
Figure III-1. The options for data transformation depend heavily on the type of data involved, the subject of this part of the book.

You can read these chapters as you need them; they’re designed to be largely standalone so that they can be read out of order.

  • Chapter 12 teaches you about logical vectors. These are the simplest types of vectors, but they are extremely powerful. You’ll learn how to create them with numeric comparisons, how to combine them with Boolean algebra, how to use them in summaries, and how to use them for condition transformations.

  • Chapter 13 dives into tools for vectors of numbers, the powerhouse of data science. You’ll learn more about counting and a bunch of important transformation and summary functions.

  • Chapter 14 gives you the tools to work with strings: you’ll slice them, you’ll dice them, and you’ll stick them back together again. This chapter mostly focuses on the stringr package, but you’ll also learn some more tidyr functions devoted to extracting data from character strings.

  • Chapter 15 introduces you to regular expressions, a powerful tool for manipulating strings. This chapter will take you from thinking that a cat walked over your keyboard to reading and writing complex string patterns.

  • Chapter 16 introduces factors: the data type that R uses to store categorical data. You use a factor when a variable has a fixed set of possible values, or when you want to use a nonalphabetical ordering of a string.

  • Chapter 17 gives you the key tools for working with dates and date-times. Unfortunately, the more you learn about date-times, the more complicated they seem to get, but with the help of the lubridate package, you’ll learn to how to overcome the most common challenges.

  • Chapter 18 discusses missing values in depth. We’ve discussed them a couple of times in isolation, but now it’s time to discuss them holistically, helping you come to grips with the difference between implicit and explicit missing values and how and why you might convert between them.

  • Chapter 19 finishes up this part of the book by giving you the tools to join two (or more) data frames together. Learning about joins will force you to grapple with the idea of keys and think about how you identify each row in a dataset.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset