Understanding databases

In addition to using plenty of memory, large amounts of data can also take a long time to process. In some cases, it may make sense to process a large dataset from one input file to another output file. Data wrangling, however, is often an iterative process, involving back and forth between data analysis and modification. It can be hard to iterate with a large dataset using a Python script. This is where database management systems can be helpful. 

A database simply refers to an organized collection of data. In contrast to files, databases are typically organized structurally to index each of the documents (another word for data entry) making it faster to retrieve specific documents or groups of documents. A database management system is software for interfacing with a database to do the following:

  • Retrieve data from a database
  • Modify data in a database
  • Write data to a database

Database management systems define a language for analyzing and modifying data that does not require you to read all of the data into memory at once, or to write a separate program. This can make database management systems an excellent tool for working with large datasets. In the next section, I will introduce a database management system called MongoDB and do a basic demonstration of some of its features.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset