Chapter 4. Introduction to MongoDB_Web Application Development with MEAN-QQ阅读男生武侠网

书名：Web Application Development with MEAN
作者名：Amos Q. Haviv Adrian Mejia Robert Onodi
本章字数：4452字
更新时间：2025-04-04 19:44:51

Chapter 4. Introduction to MongoDB

MongoDB is an exciting new breed of database. The leader of the NoSQL movement is emerging as one of the most useful database solutions in the world. Designed with web applications in mind, Mongo's high throughput, unique BSON data model, and easily scalable architecture provides web developers with better tools to store their persistent data. But the move from relational databases to NoSQL solutions can be an overwhelming task, which can be easily simplified by understanding MongoDB's design goals. In this chapter, we'll cover the following topics:

Understanding the NoSQL movement and MongoDB design goals
MongoDB BSON data structure
MongoDB collections and documents
MongoDB query language
Working with the MongoDB shell

Introduction to NoSQL

In the past couple of years, web application development usually required the usage of a relational database to store persistent data. Most developers are already pretty comfortable with using one of the many SQL solutions. So, the approach of storing a normalized data model using a mature relational database became the standard. Object-relational mappers started to crop up, giving developers proper solutions to marshal their data between the different parts of their application. But as the Web grew larger, more scaling problems were presented to a larger base of developers. To solve this problem, the community created a variety of key-value storage solutions that were designed for better availability, simple querying, and horizontal scaling. This new kind of data store became more and more robust, offering many of the features of the relational databases. During this evolution, different storage design patterns emerged, including key-value storage, column storage, object storage, and the most popular one, document storage.

In a common relational database, your data is stored in different tables, often connected using a primary to foreign key relation. Your program will later reconstruct the model using various SQL statements to arrange the data in some kind of hierarchical object representation. Document-oriented databases handle data differently. Instead of using tables, they store hierarchical documents in standard formats, such as JSON and XML.

To understand this better, let's have a look at an example of a typical blog post. To construct this blog post model using a SQL solution, you'll probably have to use at least two tables. The first one would contain post information while the second would contain post comments. A sample table structure can be seen in the following diagram:

In your application, you'll use an object-relational mapping library or direct SQL statements to select the blog post record and the post comments records to create your blog post object. However, in a document-based database, the blog post will be stored completely as a single document that can later be queried. For instance, in a database that stores documents in a JSON format, your blog post document would probably look like the following code snippet:

{
  "title": "First Blog Post",
  "comments": [

  ]
}

This demonstrates the main difference between document-based databases and relational databases. So, while working with relational databases, your data is stored in different tables, with your application assembling objects using table records. Storing your data as holistic documents will allow faster read operations since your application won't have to rebuild the objects with every read. Furthermore, document-oriented databases have other advantages.

While developing your application, you often encounter another problem: model changes. Let's assume you want to add a new property to each blog post. So, you go ahead and change your posts table and then go to your application data layer and add that property to your blog post object. But as your application already contains several blog posts, all existing blog post objects will have to change as well, which means that you'll have to cover your code with extra validation procedures. However, document-based databases are often schemaless, which means you can store different objects in a single collection of objects without changing anything in your database. Although this may sound like a call-for-trouble for some experienced developers, the freedom of schemaless storage has several advantages.

For example, think about an e-commerce application that sells used furniture. Think about your products table for a moment: a chair and a closet might have some common features, such as the type of wood, but a customer might also be interested in the number of doors the closet has. Storing the closet and chair objects in the same table means they could be stored in either a table with a large number of empty columns or using the more practical entity-attribute-value pattern, where another table is used to store key-value attributes. However, using schemaless storage will allow you to define different properties for different objects in the same collection, while still enabling you to query this collection using common properties, such as wood type. This means your application, and not the database, will be in charge of enforcing the data structure, which can help you speed up your development process.

While there are many NoSQL solutions that solve various development issues, usually around caching and scale, the document-oriented databases are rapidly becoming the leaders of the movement. The document-oriented database's ease of use, along with its standalone persistent storage offering, even threatens to replace the traditional SQL solutions in some use cases. And although there are a few document-oriented databases, none are as popular as MongoDB.

Introducing MongoDB

Back in 2007, Dwight Merriman and Eliot Horowitz formed a company named 10gen to create a better platform to host web applications. The idea was to create a hosting as a service that will allow developers to focus on building their application rather than handle hardware management and infrastructure scaling. Soon, they discovered the community wasn't keen on giving up so much of the control over their application's infrastructure. As a result, they released the different parts of the platform as open source projects.

One such project was a document-based database solution called MongoDB. Derived from the word humongous, MongoDB was able to support complex data storage, while maintaining the high-performance approach of other NoSQL stores. The community cheerfully adopted this new paradigm, making MongoDB one of the fastest-growing databases in the world. With more than 150 contributors and over 10,000 commits, it also became one the most popular open source projects.

MongoDB's main goal was to create a new type of database that combined the robustness of a relational database with the fast throughput of distributed key-value data stores. With the scalable platform in mind, it had to support simple horizontal scaling while sustaining the durability of traditional databases. Another key design goal was to support web application development in the form of standard JSON outputs. These two design goals turned out to be MongoDB's greatest advantages over other solutions as these aligned perfectly with other trends in web development, such as the almost ubiquitous use of cloud virtualization hosting or the shift towards horizontal, instead of vertical, scaling.

First dismissed as another NoSQL storage layer over the more viable relational database, MongoDB evolved way beyond the platform where it was born. Its ecosystem grew to support most of the popular programming platforms, with the various community-backed drivers. Along with this, many other tools were formed including different MongoDB clients, profiling and optimization tools, administration and maintenance utilities, as well as a couple of VC-backed hosting services. Even major companies such as eBay and The New York Times began to use MongoDB data storage in their production environment. To understand why developers prefer MongoDB, it's time we dive into some of its key features.

Key features of MongoDB

MongoDB has some key features that helped it become so popular. As we mentioned before, the goal was to create a new breed between traditional database features and the high performance of NoSQL stores. As a result, most of its key features were created to evolve beyond the limitations of other NoSQL solutions while integrating some of the abilities of relational databases. In this section, you'll learn why MongoDB can become your preferred database when approaching modern web application developments.

The BSON format

One of the greatest features of MongoDB is its JSON-like storage format named BSON. Standing for Binary JSON, the BSON format is a binary-encoded serialization of JSON-like documents, and it is designed to be more efficient in size and speed, allowing MongoDB's high read/write throughput.

Like JSON, BSON documents are a simple data structure representation of objects and arrays in a key-value format. A document consists of a list of elements, each with a string typed field name and a typed field value. These documents support all of the JSON specific data types along with other data types, such as the Date type.

Another big advantage of the BSON format is the use of the _id field as primary key. The _id field value will usually be a unique identifier type, named ObjectId, that is either generated by the application driver or by the mongod service. In the event the driver fails to provide a _id field with a unique ObjectId, the mongod service will add it automatically using:

A 4-byte value representing the seconds since the Unix epoch
A 3-byte machine identifier
A 2-byte process ID
A 3-byte counter, starting with a random value

So, a BSON representation of the blog post object from the previous example would look like the following code snippet:

{
  "_id": ObjectId("52d02240e4b01d67d71ad577"),
  "title": "First Blog Post",
  "comments": [
  ...
  ]
}

The BSON format enables MongoDB to internally index and map document properties and even nested documents, allowing it to scan the collection efficiently and more importantly, to match objects to complex query expressions.

MongoDB ad hoc queries

One of the other MongoDB design goals was to expand the abilities of ordinary key-value stores. The main issue of common key-value stores is their limited query capabilities, which usually means your data is only queryable using the key field, and more complex queries are mostly predefined. To solve this issue, MongoDB drew its inspiration from the relational databases dynamic query language.

Supporting ad hoc queries means that the database will respond to dynamically structured queries out of the box without the need to predefine each query. It is able to do this by indexing BSON documents and using a unique query language. Let's have a look at the following SQL statement example:

SELECT * FROM Posts WHERE Title LIKE '%mongo%';

This simple statement is asking the database for all the post records with a title containing the word mongo. Replicating this query in MongoDB will be as follows:

db.posts.find({ title:/mongo/ });

Running this command in the MongoDB shell will return all the posts whose title field contains the word mongo. You'll learn more about the MongoDB query language later in this chapter, but for now it is important to remember that it is almost as query-able as your traditional relational database. The MongoDB query language is great, but it raises the question of how efficiently these queries run when the database gets larger. Like relational databases, MongoDB solves this issue using a mechanism called indexing.

MongoDB indexing

Indexes are a unique data structure that enables the database engine to efficiently resolve queries. When a query is sent to the database, it will have to scan through the entire collection of documents to find those that match the query statement. This way, the database engine processes a large amount of unnecessary data, resulting in poor performance.

To speed up the scan, the database engine can use a predefined index, which maps documents fields and can tell the engine which documents are compatible with this query statement. To understand how indexes work, let's say we want to retrieve all the posts that have more than 10 comments. For instance, if our document is defined as follows:

{
  "_id": ObjectId("52d02240e4b01d67d71ad577"),
  "title": "First Blog Post",
  "comments": [

  ],
  "commentsCount": 12
}

So, a MongoDB query that requests for documents with more than 10 comments would be as follows

db.posts.find({ commentsCount: { $gt: 10 } });

To execute this query, MongoDB would have to go through all the posts and check whether the post has commentCount larger than 10. But if a commentCount index was defined, then MongoDB would only have to check which documents have commentCount larger than 10, before retrieving these documents. The following diagram illustrates how a commentCount index would work:

MongoDB replica set

To provide data redundancy and improved availability, MongoDB uses an architecture called replica set. Replication of databases helps protect your data to recover from hardware failure and increase read capacity. A replica set is a set of MongoDB services that host the same dataset. One service is used as the primary and the other services are called secondaries. All of the set instances support read operations, but only the primary instance is in charge of write operations. When a write operation occurs, the primary will inform the secondaries about the changes and make sure they've applied it to their datasets' replication. The following diagram illustrates a common replica set:

The workflow of a replica set with primary and two secondaries

Another robust feature of the MongoDB replica set is its automatic failover. When one of the set members can't reach the primary instance for more than 10 seconds, the replica set will automatically elect and promote a secondary instance as the new primary. When the old primary comes back online, it will rejoin the replica set as a secondary instance.

Replication is a very robust feature of MongoDB that is derived directly from its platform origin and is one of the main features that makes MongoDB production-ready. However, it is not the only one.

Note

To learn more about MongoDB replica sets, visit http://docs.mongodb.org/manual/replication/.

MongoDB sharding

Scaling is a common problem with a growing web application. The various approaches to solve this issue can be divided into two groups: vertical scaling and horizontal scaling. The differences between the two are illustrated in the following diagram:

Vertical scaling with a single machine versus horizontal scaling with multiple machines

Vertical scaling is easier and consists of increasing single machine resources, such as RAM and CPU, in order to handle the load. However, it has two major drawbacks: first, at some level, increasing a single machine's resources becomes disproportionately more expensive compared to splitting the load between several smaller machines. Secondly, the popular cloud-hosting providers limit the size of the machine instances you can use. So, scaling your application vertically can only be done up to a certain level.

Horizontal scaling is more complicated and is done using several machines. Each machine will handle a part of the load, providing better overall performance. The problem with horizontal database scaling is how to properly divide the data between different machines and how to manage the read/write operations between them.

Luckily MongoDB supports horizontal scaling, which it refers to as sharding. Sharding is the process of splitting the data between different machines, or shards. Each shard holds a portion of the data and functions as a separate database. The collection of several shards together is what forms a single logical database. Operations are performed through services called query routers, which ask the configuration servers how to delegate each operation to the right shard.

Note

To learn more about MongoDB sharding, visit http://docs.mongodb.org/manual/sharding/.

These features and many others are what make MongoDB so popular. Though there are many good alternatives, MongoDB is becoming more and more ubiquitous among developers and is on its way to becoming the leading NoSQL solution. After this brief overview, it's time we dive in a little deeper.

MongoDB shell

If you followed your local environment. To interact with MongoDB, you'll use the MongoDB shell that you encountered in Chapter 1, Introduction to MEAN. The MongoDB shell is a command-line tool that enables the execution of different operations using a JavaScript syntax query language.

In order to explore the different parts of MongoDB, let's start the MongoDB shell by running the mongo executable, as follows:

$ mongo

If MongoDB has been properly installed, you should see an output similar to what is shown in the following screenshot:

Notice how the shell is telling you the current shell version, and that it has connected to the default test database.

MongoDB databases

Each MongoDB server instance can store several databases. Unless specifically defined, the MongoDB shell will automatically connect to the default test database. Let's switch to another database called mean by executing the following command:

> use mean

You'll see a command-line output telling you that the shell switched to the mean database. Notice that you didn't need to create the database before using it because in MongoDB, databases and collections are lazily created when you insert your first document. This behavior is consistent with MongoDB's dynamic approach to data. Another way to use a specific database is to run the shell executable with the database name as an argument, as follows:

$ mongo mean

The shell will then automatically connect to the mean database. If you want to list all the other databases in the current MongoDB server, just execute the following command:

> show dbs

This will show you a list of currently available databases that have at least one document stored.

MongoDB collections

A MongoDB collection is a list of MongoDB documents and is the equivalent of a relational database table. A collection is created when the first document is being inserted. Unlike a table, a collection doesn't enforce any type of schema and can host different structured documents.

To perform operations on a MongoDB collection, you'll need to use the collection methods. Let's create a posts collection and insert the first post. In order to do this, execute the following command in the MongoDB shell:

> db.posts.insert({"title":"First Post", "user": "bob"})

After executing the preceding command, it will automatically create the posts collection and insert the first document. To retrieve the collection documents, execute the following command in the MongoDB shell:

> db.posts.find()

You should see a command-line output similar to what is shown in the following screenshot:

This means that you have successfully created the posts collection and inserted your first document.

To show all available collections, issue the following command in the MongoDB shell:

> show collections

The MongoDB shell will output the list of available collections, which in your case are the posts collection and another collection called system.indexes, which holds the list of your database indexes.

If you'd like to delete the posts collection, you will need to execute the drop() command as follows:

> db.posts.drop()

The shell will inform you that the collection was dropped, by responding with a true output.

MongoDB CRUD operations

Create, read, update, and delete (CRUD) operations, are the basic interactions you perform with a database. To execute CRUD operations over your database entities, MongoDB provides various collection methods.

Creating a new document

You're already familiar with the basic method of creating a new document using the insert() method, as you previously did in earlier examples. Besides the insert() method, there are two more methods called update() and save() to create new objects.

Creating a document using insert()

The most common way to create a new document is to use the insert() method. The insert method takes a single argument that represents the new document. To insert a new post, just issue the following command in the MongoDB shell:

> db.posts.insert({"title":"Second Post", "user": "alice"})

Creating a document using update()

The update() method is usually used to update an existing document. You can also use it to create a new document, if no document matches the query criteria, using the following upsert flag:

> db.posts.update({
 "user": "alice"
}, {
 "title": "Second Post",
 "user": "alice"
}, {
 upsert: true
})

In the preceding example, MongoDB will look for a post created by alice and try to update it. Considering the fact that the posts collection doesn't have a post created by alice and the fact you have used the upsert flag, MongoDB will not find an appropriate document to update and will create a new document instead.

Creating a document using save()

Another way of creating a new document is by calling the save() method, passing it a document that either doesn't have an _id field or has an _id field that doesn't exist in the collection:

> db.posts.save({"title":"Second Post", "user": "alice"})

This will have the same effect as the update() method and will create a new document instead of updating an existing one.

Reading documents

The find() method is used to retrieve a list of documents from a MongoDB collection. Using the find() method, you can either request for all the documents in a collection or use a query to retrieve specific documents.

Finding all the collection documents

To retrieve all the documents in the posts collection, you should either pass an empty query to the find() method or not pass any arguments at all. The following query will retrieve all the documents in the posts collection:

> db.posts.find()

Furthermore, performing the same operation can also be done using the following query:

> db.posts.find({})

These two queries are basically the same and will return all the documents in the posts collection.

Using an equality statement

To retrieve a specific document, you can use an equality condition query that will grab all the documents, which comply with that condition. For instance, to retrieve all the posts created by alice, you will need to issue the following command in the shell:

> db.posts.find({ "user": "alice" })

This will retrieve all the documents that have the user property equal to alice.

Using query operators

Using an equality statement may not be enough. To build more complex queries, MongoDB supports a variety of query operators. Using query operators, you can look for different sorts of conditions. For example, to retrieve all the posts that were created by either alice or bob, you can use the following $in operator:

> db.posts.find({ "user": { $in: ["alice", "bob"] } })

Note

There are plenty of other query operators you can learn about by visiting http://docs.mongodb.org/manual/reference/operator/query/#query-selectors.

Building AND/OR queries

When you build a query, you may need to use more than one condition. Like in SQL, you can use AND/OR operators to build multiple condition query statements. To perform an AND query, you simply add the properties you'd like to check to the query object. For instance, take look at the following query:

> db.posts.find({ "user": "alice", "commentsCount": { $gt: 10 } })

It is similar to the find() query you've previously used but adds another condition that verifies the document's commentCount property and will only grab documents that were created by alice and have more than 10 comments. An OR query is a bit more complex because it involves the $or operator. To understand it better, take a look at another version of the previous example:

> db.posts.find( { $or: [{ "user": "alice" }, { "user": "bob" }] })

Like the query operators example, this query will also grab all the posts created by either bob or alice.

Updating existing documents

Using MongoDB, you have the option of updating documents using either the update() or save() methods.

Updating documents using update()

The update() method takes three arguments to update existing documents. The first argument is the selection criteria that indicate which documents to update, the second argument is the update statement, and the last argument is the options object. For instance, in the following example, the first argument is telling MongoDB to look for all the documents created by alice, the second argument tells it to update the title field, and the third is forcing it to execute the update operation on all the documents it finds:

> db.posts.update({
 "user": "alice"
}, {
 $set: {
 "title": "Second Post"
 }
}, {
 multi: true
})

Notice how the multi property has been added to the options object. The update() method's default behavior is to update a single document, so by setting the multi property, you tell the update() method to update all the documents that comply with the selection criteria.

Updating documents using save()

Another way of updating an existing document is by calling the save() method, passing it a document that contains an _id field. For instance, the following command will update an existing document with an _id field that is equal to ObjectId("50691737d386d8fadbd6b01d"):

> db.posts.save({
 "_id": ObjectId("50691737d386d8fadbd6b01d"),
 "title": "Second Post",
 "user": "alice"
});

It's important to remember that if the save() method is unable to find an appropriate object, it will create a new one instead.

Deleting documents

To remove documents, MongoDB utilizes the remove() method. The remove() method can accept up to two arguments. The first one is the deletion criteria, and the second is a Boolean argument that indicates whether or not to remove multiple documents.

Deleting all documents

To remove all the documents from a collection, you will need call the remove() method with no deletion criteria at all. For example, to remove all the posts documents, you'll need to execute the following command:

> db.posts.remove()

Notice that the remove() method is different from the drop() method as it will not delete the collection or its indexes. To rebuild your collection with different indexes, it is preferred that you use the drop() method.

Deleting multiple documents

To remove multiple documents that match a criteria from a collection, you will need to call the remove() method with a deletion criteria. For example, to remove all the posts made by alice, you'll need to execute the following command:

> db.posts.remove({ "user": "alice" })

Note that this will remove all the documents created by alice, so be careful when using the remove() method.

Deleting a single document

To remove a single document that matches a criteria from a collection, you will need to call the remove() method with a deletion criteria and a Boolean stating that you only want to delete a single document. For example, to remove the first post made by alice, you'll need to execute the following command:

> db.posts.remove({ "user": "alice" }, true)

This will remove the first document that was created by alice and leave other documents even if they match the deletion criteria.

Summary

In this chapter, you learned about NoSQL databases and how they can be useful for modern web development. You also learned about the emerging leader of the NoSQL movement, MongoDB. You took a deeper dive in understanding the various features that makes MongoDB such a powerful solution and learned about its basic terminology. Finally, you caught a glimpse of MongoDB's powerful query language and how to perform all four CRUD operations. In the next chapter, we'll discuss how to connect Node.js and MongoDB together using the popular Mongoose module.