Working with Large Datasets

All of the approaches to data wrangling that have been covered in this book are good for processing datasets that are sufficiently small. Once datasets reach a certain size, however, a different approach may be required. For even larger collections of data, data mining techniques may be more appropriate.

In this chapter, I will discuss the approaches to working with datasets that are not too big to be processed on a single computer, but are too big to be read into memory all at once. I will discuss computer memory and introduce databases as a means of storing data. This chapter will include the following sections:

  • Logistical Overview
  • Understanding computer memory
  • Understanding databases
  • Introducing MongoDB
  • Interfacing with MongoDB from Python
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset