Introducing MongoDB

MongoDB is what is referred to as a NoSQL database, which refers to a data model that is not tabular, as opposed to relational databases which are tabular. The structure of data in MongoDB is analogous to JSON, with each of the documents consisting of key-value pairs.

Once you have MongoDB set up on your computer and you have the MongoDB server running, you can import your data into a database using the mongoimport terminal command. The mongoimport command will take data from a static file, parse the data, and place the data into a database. The documentation for mongoimport is available at the following link: https://docs.mongodb.com/manual/reference/program/mongoimport/.

There are a few parameters that need to be specified along with the mongoimport command. The first of these is the name of the input file which should be written after the --file parameter. The command should be run in a terminal from the directory containing fake_weather_data.csv, so the filename is specified by the following:

$ mongoimport --file fake_weather_data.csv

Next, since you are importing data from a CSV file, you will need to specify --type csv in order to indicate a CSV file. In addition, the --headerline parameter should be specified to indicate that the field names to be used are those specified in the first line of the CSV file.

Lastly, documents in MongoDB are organized under two levels:

  • MongoDB documents belong to a collection within MongoDB
  • MongoDB collections belong to a database

As such, you will need to specify a name for the database with the --database parameter and a name for the collection with the --collection parameter. I will use weather for the database name and records for the collection name. 

The following is the finished command to import the CSV data into MongoDB. Note that the import will take some time to complete (possibly a few hours depending on your system), so you may want to do something else for a little while while it finishes. Also note that you will need plenty of available storage on your hard drive (15 or so GB) in order to complete the import:

$ mongoimport --file fake_weather_data.csv --type csv --headerline --db weather --collection records

Once the import is finished, the data is stored in the computer's hard drive in a database file that can be interfaced using MongoDB. To use MongoDB, you can enter a mongo shell from the terminal by entering the following command:

$ mongo

Inside the Mongo shell, you can write commands to interact directly with the database. To start with, you can use the following command to indicate the database which you would like to interact with: 

> use weather

After selecting the database, the following will count the number of documents in the records collection:

> db.records.count()

The following will retrieve and display a single document:

> db.records.findOne()

The following screenshot is the result of running the previous commands in the mongo shell on my computer:

MongoDB has a powerful language for querying and modifying documents in a collection, so I've included a link to the MongoDB documentation in the external resources for further reading. Rather than attempt to cover everything here, I will do a simple demonstration to change is_cloudy and is_sunny from integers to logical data types.

It is possible to update multiple documents in a collection using the update() function. The first argument to the update() function is a filter that selects which data should be updated. Commands in MongoDB are written a bit like Python dictionaries. The simplest filter is one in which a particular field has a particular value, and is written as follows:

 {<fieldname>:<value>}

In order to select all of the fields where is_sunny is 1, you can use the following as the first argument to the updateMany() function:

{ is_sunny : 1 }

The next argument to the update() function is a structure that specifies the update that should take place. This takes the following form:

{ <update operator> : { <field> : <value> } }

The update operator to set a new value is $set. The second argument, which specifies that the is_sunny field should be set to true, as follows:

{ $set : { is_sunny : true } }

A third argument should be passed to the update() function that specifies that the operation should update multiple documents and not just one. Putting it all together, the following command will change the values of the is_sunny field from 1 to true where the value was 1 to begin with:

> db.records.update( { is_sunny : 1}, { $set : { is_sunny : true} }, { multi : true } )

This will also take some time to finish. When it does finish, however, you can verify that the value of is_sunny is either 0 or true by running db.records.find(). The following is the output on my computer:

If you wanted to, you could easily repeat this step to change 0 to false for the is_cloudy and is_sunny fields. This step isn't really necessary, though, as it will just take more time. After you are done with the demonstration, and with any personal exploration, you can remove the data from the database by running the following:

> db.records.remove({})

While MongoDB is quite powerful, it does not have the full capability of a programming language. In the next section, I will demonstrate how to interface MongoDB with Python to achieve the same result. Using Python to import data to MongoDB gives you the ability to process data as it is placed in the database. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset