In Chapter 2, we discussed MongoDB CRUD operations, embedded documents, and arrays. In this chapter, we cover the following topics.
Data models.
Data model relationship between documents.
Modeling tree structures.
Aggregation operations.
SQL aggregation terms and corresponding MongoDB aggregation operations.
Data Models
MongoDB provides two data model designs for data modeling:
Embedded data models.
Normalized data models.
Embedded Data Models
In MongoDB, you can embed related data in a single document. This schema design is known as denormalized models. Consider the example shown in Figure 3-1.
This embedded document model allows applications to store the related piece of information in the same records. As a result, the application requires only few queries and updates to complete common operations.
We can use embedded documents to represent both one-to-one relationships (a “contains” relationship between two entities) and one-to-many relationships (when many documents are viewed in the context of one parent).
Embedded documents provide better results in these cases:
For read operations.
When we need to retrieve related data in a single database operation.
Embedded data models update related data in a single atomic operation. Embedded document data can be accessed using dot notation.
Normalized Data Models
Normalized data models describe relationships using references, as illustrated in Figure 3-2.
Normalized data models can best be used in the following circumstances:
When embedding data model results duplication of data.
To represent complex many-to-many relationships.
To model large hierarchical data sets.
Normalized data models do not provide good read performance.
Data Model Relationship Between Documents
Let’s explore a data model that uses an embedded document and references.
Recipe 3-1. Data Model Using an Embedded Document
In this recipe, we are going to discuss a data model using an embedded document.
Problem
You want to create a data model for a one-to-one relationship.
Solution
Use an embedded document.
How It Works
Let’s follow the steps in this section to design a data model for a one-to-one relationship.
Step 1: One-to-One Relationships
Consider this example.
{
_id: "James",
name: "James William"
}
{
student_id: "James",
street: "123 Hill Street",
city: "New York",
state: "US",
}
Here, we have student and address relationships in which an address belongs to the student. If we are going to retrieve address data with the name frequently, then referencing requires multiple queries to resolve references. In this scenario, we can embed address data with the student data to provide a better data model, as shown here.
{
_id: "James",
name: "James William",
address: {
street: "123 Hill Street",
city: "New York",
state: "US",
}
}
With this data model, we can retrieve complete student information with one query.
Step 2: One-to-Many Relationships
Consider this example.
{
_id: "James",
name: "James William"
}
{
student_id: "James",
street: "123 Hill Street",
city: "New York",
state: "US",
}
{
student_id: "James",
street: "234 Thomas Street",
city: "New Jersey",
state: "US",
}
Here, we have a student and multiple address relationships (a student has multiple addresses). If we are going to retrieve address data with the name frequently, then referencing requires multiple queries to resolve references. In this scenario, the optimal way to design the schema is to embed address data with the student data as shown here.
{
_id: "James",
name: "James William",
address: [{
street: "123 Hill Street",
city: "New York",
state: "US",
},
{
street: "234 Thomas Street",
city: "New Jersey",
state: "US",
}]
}
This data model allows us to retrieve complete student information with one query.
Recipe 3-2. Data Model Using Document References
In this recipe, we are going to discuss a data model using document references.
Problem
You want to create a data model for a one-to-many relationship.
Solution
Use a document reference.
How It Works
Let’s follow the steps in this section to design a data model for a one-to-many relationship.
Step 1: One-to-Many Relationships
Consider the following data model that maps a publisher and book relationship.
Here, the publisher document is embedded inside the book document, which leads to repetition of the publisher data model.
In this scenario, we can document references to avoid repetition of data. In document references, the growth of relationships determines where to store the references. If the number of books per publisher is small, then we can store the book reference inside the publisher document as shown here.
If the number of books per publisher is unbounded, this data model would lead to mutable, growing arrays. We can avoid this situation by storing the publisher reference inside the book document as shown here.
The next command retrieves the immediate children of the parent.
db.author.find( { parent: "Subhashini" } )
Here is the output,
> db.author.find( { parent: "Subhashini" } )
{ "_id" : "Books", "parent" : "Subhashini" }
{ "_id" : "Article", "parent" : "Subhashini" }
>
Step 2: Tree Structure with Child References
The child references pattern stores each tree node in a document; in addition to the tree node, the document stores in an array the _id value(s) of the node’s children.
Consider the following author tree model with child references.
Child references are a good choice to work with tree storage when there are no subtree operations.
Step 3: Tree Structure with an Array of Ancestors
The array of ancestors pattern stores each tree node in a document; in addition to the tree node, the document stores in an array the _id value(s) of the node’s ancestors or path.
Consider this author tree model with an array of ancestors.
db.author.insert( { _id: " A Framework For Extracting Information From Web Using VTD-XML ", ancestors: [ "Subhashini", "Article" ], parent: "Article" } )
This pattern provides an efficient solution to find all descendants and the ancestors of a node. The array of ancestors pattern is a good choice for working with subtrees.
Aggregation
Aggregation operations group values from multiple documents and can perform variety of operations on the grouped values to return a single result. MongoDB provides following aggregation operations:
Aggregation pipeline.
Map-reduce function.
Single-purpose aggregation methods.
Aggregation Pipeline
The aggregation pipeline is a framework for data aggregation. It is modeled based on the concept of data processing pipelines. Pipelines execute an operation on some input and use that output as an input to the next operation. Documents enter a multistage pipeline that transforms them into an aggregated result.
Recipe 3-4. Aggregation Pipeline
In this recipe, we are going to discuss how the aggregation pipeline works.
Problem
You want to work with aggregation functions.
Solution
Use this method.
db.collection.aggregate()
How It Works
Let’s follow the steps in this section to work with the aggregation pipeline.
Step 1: Aggregation Pipeline
Execute the following orders collection to perform aggregation.
MongoDB also provides map-reduce to perform aggregation operations. There are two phases in map-reduce: a map stage that processes each document and outputs one or more objects and a reduce stage that combines the output of the map operation.
A custom JavaScript function is used to perform map and reduce operations. Map-reduce is less efficient and more complex compared to the aggregation pipeline.
Recipe 3-5. Map-Reduce
In this recipe, we are going to discuss how to perform aggregation operations using map-reduce.
Problem
You want to work with aggregation operations using map-reduce.
Solution
Use a customized JavaScript function.
How It Works
Let’s follow the steps in this section to work with map-reduce.
Step 1: Map-Reduce
Execute the following orders collection to perform aggregation operations.
MongoDB also provides single-purpose aggregation operations such as db.collection.count() and db.collection.distinct(). These aggregate operations aggregate documents from a single collection. This functionality provides simple access to common aggregation processes.
Recipe 3-6. Single-Purpose Aggregation Operations
In this recipe, we are going to discuss how to use single-purpose aggregation operations.
Problem
You want to work with single-purpose aggregation operations.
Solution
Use these commands.
db.collection.count()
db.collection.distinct()
How It Works
Let’s follow the steps in this section to work with single-purpose aggregation operations.
Step 1: Single-Purpose Aggregation Operations
Execute the following orders collection to perform single-purpose aggregation operations.