Back
Jun 8, 2022

How to Use MongoDB in Python: Gearheart`s Experience

What is MongoDB?

MongoDB is an open-source document-oriented database management system (DBMS) with flexible schemas. MongoDB was founded by Dwight Merriman and Eliot Horowitz who faced with development issues and struggled with relational database management system while building web applications.

According to one of its founders, the MongoDB name comes from the word humongous and simply means that the database can handle lots of data. Merriman and Horowitz helped to create the 10gen Inc. in 2007 with an aim to commercialise database program and related software. The company was renamed to MongoDB Inc. in 2013.

An open-source database was released in 2009. Now it is available under the Server Side Public License (SSPL) license terms.

The document itself is a set of key/value pairs. Documents have a dynamic schema. Dynamic schemas mean that documents from one collection do not necessarily have the same set of fields and structures. It also means that the common fields in the documents collections can contain different types of data. Below is an example of a document structure:

{
   _id: ObjectId('7df78ad8902c')
   title: 'MongoDB Overview',
   description: 'MongoDB is no sql database',
   by: 'tutorials point',
   url: 'http://www.tutorialspoint.com',
   tags: ['mongodb', 'database', 'NoSQL'],
   likes: 100,
   comments: [
      {
         user: 'user1',
         message: 'My first comment',
         dateCreated: new Date(2011,1,20,2,15),
         like: 0
      },
      {
         user: 'user2',
         message: 'My second comments',
         dateCreated: new Date(2011,1,25,7,45),
         like: 5
      }
   ]
}

_id is the 12-byte hexadecimal number which assures the uniqueness of each document. You obviously can insert it in the document. If it’s not specified, MongoDB provides a unique ID for each document.

DBMS uses JSON style documents that are stored in the BSON binary format. Thanks to the GridFS protocol, MongoDB has the capability to store and retrieve files. Like the other document-oriented DBMS (CouchDB, etc.), MongoDB is not a relational DBMS. There is no such thing as a "transaction." Atomicity is only guaranteed on a whole document-level, so the partial update of the document can not happen. Also, there is no an "isolation" concept: any data that is read by one client can simultaneously be changed by another client.

Why You Should Use MongoDB?

The following are some of the MongoDB advantages:

  1. Flexible schema – it maintains a hierarchical data structure
  2. A large number of the MongoDB drivers and client libraries. MongoDB Drivers are used for connecting client applications and the database. For example, if we want to connect to MongoDB using Python, we need to download and integrate the Python driver so that the program can work with the MongoDB database
  3. Flexible deployment
  4. The document-oriented storage (in the form of JSON style documents)
  5. Javascript as a language for querying
  6. Dynamic queries
  7. Index support
  8. Profiling queries
  9. Effective storing of large amounts of binary data, such as images and videos
  10. Journaling operations of modifying data in the database
  11. Supporting fault tolerance and scalability: an asynchronous replication, a replica set and a distributed database connected to the nodes
  12. Can work in accordance with the MapReduce paradigm. MapReduce – a programming distributed computing model provided by Google that is used for parallel computing on a large, multiple petabytes, data sets in the computer clusters
  13. Full-text search, supporting Russian language and morphological analysis
  14. MongoDB supports horizontal scalability through sharding. Sharding is the process of storing data records across multiple machines. This approach is used in MongoDB to meet the data growth demands. As the size of the data increases, a single machine may not be sufficient to store the data nor provide an acceptable read and write throughput. Sharding solves the problem with horizontal scaling.
 MongoDBRelational DBMSKey-Value
Rich data modelYesNoNo
Dynamic schemaYesNoYes
Data validationYesYesNo
Typed dataYesYesNo
Localization of dataYesNoYes
Fields updatingYesYesNo

So, how to use MongoDB in Python? Take a look at our short Python MongoDB tutorial.

PyMongo

PyMongo is a tool for working with MongoDB and is the official recommended way to work when using Mongo database in Python.

In PyMongo we use dictionaries to represent documents. As an example, the following dictionary might be used to represent a blog post:

import datetime


post = {
	"author": "Mike",
    "text": "My first blog post!",
    "tags": ["mongodb", "python", "pymongo"],
    "date": datetime.datetime.utcnow()
}

Documents can contain native Python types (such as datetime.datetime) which will be automatically converted to and from the appropriate BSON types.

To add a document into the collection, we can use the insert_one() method:

posts = db.posts
post_id = posts.insert_one(post).inserted_id

The most useful type of query is the find_one method. This method returns a single document matching a query (or none, in case if there are no matches). Here we use find_one() to get the first document from the posts collection:

>>> posts.find_one()

The result is a document that we inserted previously. find_one() also supports querying on specific elements that the document must match.

To get more documents, we use the find() method. find() returns a Cursor instance, which allows us to iterate over all matching documents.

Also, we can limit the find() returned results. We only get documents with author “Mike” here:

for post in posts.find({"author": "Mike"}):
    print(post)

At the PyMongo core is the MongoClient object, which is used to make connections and queries to a MongoDB database cluster. It can be used to connect to a standalone mongod instance, a replica set or mongos instances. Repository: https://github.com/mongodb/mongo-python-driver
Documentation: https://api.mongodb.com/python/current/

MongoEngine

MongoEngine is ODM (Python MongoDB ORM, but for document-oriented database) that allows to work with MongoDB on Python. It uses simple declarative API similar to Django ORM.

To specify the Python MongoDB  schema document, we create a class that is inherited from the Document base class. Fields are determined from adding the document’s class attributes.

from mongoengine import *


class Metadata(EmbeddedDocument):
    tags = ListField(StringField())
    revisions = ListField(IntField())


class WikiPage(Document):
    title = StringField(required=True)
    text = StringField()
    metadata = EmbeddedDocumentField(Metadata)

Now, when we have identified how our documents will be structured, we begin adding some of the documents to the database. Firstly, we need to create an object:

page1 = WikiPage(title='Example 1', text='Wiki Page 1').save()

We could also define our object using the attribute syntax:

page2 = WikiPage(title='Example 2')
page2.text = 'Wiki Page 2'
page2.metadata =  Metadata(tags=['tag 1', 'tag 2']).save()
page2.save()

Each document class (i.e. any class that is directly or indirectly inherited from the document) has an attribute and objects that are used to access the documents in the collection of database associated with this class. So, let's see how to get the headlines of our pages:

for page in WikiPage.objects:
    print(page.title)

Here is a brief list of some of the main features of MongoEngine:

  1. Document schema declaration and validation
  2. Elegant querying syntax, similar to Django ORM
  3. Document inheritance, with support for “polymorphic querying”
  4. Aggregation methods, such as Sum and Avg
  5. Advanced query condition using Q objects
  6. Backend session and authentication for Django

Repository: https://github.com/MongoEngine/mongoengine
Documentation: http://docs.mongoengine.org/

Motor

Motor is an async driver for MongoDB. It can be used as Tornado or asyncio-applications. Motor never blocks the event loop when connecting to MongoDB or when performing input/output operations. This driver is practically a wrapper over the entire API PyMongo for non-blocking access to MongoDB.

The tornado.gen module allows using subprogrammes to simplify an asynchronous code. It supports Tornado-applications with multiple IOLoops. It can transfer data from GridFS to Tornado RequestHandler, using stream_to_handler () or the GridFSHandler class.

Motor provides a single class-client - MotorClient. Unlike MongoClient PyMongo, MotorClient does not actually connect in the background on startup. Instead, it is connected on demand, at the first operation request.

Motor supports almost every PyMongo method, but the methods take an additional callback function.

Motor uses gevent-like method to wrap PyMongo and run its asynchronously, presenting a classic callback interface to Tronado applications. This driver can easily keep up with the PyMongo development in the future.

Repository: https://github.com/mongodb/motor
Documentation: https://motor.readthedocs.io/en/

MotorEngine is created based on Motor. Motorengine is a port of MongoEngine.

MongoKit

MongoKit is a Python-module, a PyMongo wrapper, that brings structured schemes and screening layer.

MongoKit uses simple types of the Python data to describe the structure of the document. MongoKit is pretty fast and has access to clean PyMongo-layer without API changes to increase the speed. It has lots of additional features, such as automatic reference to a document, user types or i18n support. Documents are improved by the validate() Python-dictionaries method.

Document declaration is as follows:

>>> from mongokit import 
>>> import datetime
>>> connection = Connection()
>>> @connection.register
... class BlogPost(Document):
...     structure = {
...         'title':unicode,
...         'body':unicode,
...         'author':unicode,
...         'date_creation':datetime.datetime,
...         'rank':int
...     }
...     required_fields = ['title','author', 'date_creation']
...     default_values = {'rank':0, 'date_creation':datetime.datetime.utcnow}

Setting the link and registering our objects:

>>> blogpost = con.test.example.BlogPost() # this uses the database "test" and the collection "example"
>>> blogpost['title'] = u'my title'
>>> blogpost['body'] = u'a body'
>>> blogpost['author'] = u'me'
>>> blogpost
{'body': u'a body', 'title': u'my title', 'date_creation': datetime.datetime(...), 'rank': 0, 'author': u'me'}
>>> blogpost.save()

Saving an object will call the validate() method. A more complex structure can be used as follows:

>>>  @connection.register
...  class ComplexDoc(Document):
...     database = 'test'
...     collection = 'example'
...     structure = {
...         "foo" : {"content":int},
...         "bar" : {
...             'bla':{'spam':int}
...         }
...     }
...     required_fields = ['foo.content', 'bar.bla.spam']

Repository: https://github.com/namlook/mongokit/
Documentation: http://namlook.github.io/mongokit/

Conclusion

Hope you managed to clear up for yourself how to use MongoDB with Python. When you select a Python driver for MongoDB, you should answer two questions:

  1. Do I need a synchronous or an asynchronous driver?
  2. Do I need to fix the structure of the documents in the code?

In the case of asynchronous applications - you need Motor or MotorEngine.

All synchronous drivers are wrappers for PyMongo. If you do not need to fix the structure of the document, the easiest way for you is to work directly with PyMongo.

If you want to fix the structure of the documents in the code, you can take MongoEngine or MongoKit. Mainly, we are working with Django, so it is more natural for us to us MongoEngine.

Also you can contact us if you want to learn more about Python MongoDB best practices.

Read more about MVP development services and web app development.

Subscribe for the news and updates

More thoughts
Nov 29, 2022Technology
React Performance Testing with Jest

One of the key requirements for modern UI is being performant. No matter how beautiful your app looks and what killer features it offers, it will frustrate your users if it clangs.

May 10, 2018Technology
How to Build a Cloud-Based Leads Management System for Universities

Lead management is an important part of the marketing strategy of every company of any size. Besides automating various business processes, privately-held organizations should consider implementing an IT solution that would help them manage their leads. So, how should you make a web-based leads management system for a University in order to significantly increase sales?

May 22, 2017Technology
Web Application Security: 10 Best Practices

Protection of WEB App is of paramount importance and it should be afforded the same level of security as the intellectual rights or private property. I'm going to cover how to protect your web app.

Dec 1, 2016Technology
How to Use Django & PostgreSQL for Full Text Search

For any project there may be a need to use a database full-text search. We expect high speed and relevant results from this search. When we face such problem, we usually think about Solr, ElasticSearch, Sphinx, AWS CloudSearch, etc. But in this article we will talk about PostgreSQL. Starting from version 8.3, a full-text search support in PostgreSQL is available. Let's look at how it is implemented in the DBMS itself.

Mar 6, 2010TechnologyManagement
Supplementing settings in settings_local

For local project settings, I use old trick with settings_local file:try:from settings_local import \*except ImportError:passSo in settings_local.py we can override variables from settings.py. I didn't know how to supplement them. For example how to add line to INSTALLED_APPS without copying whole list.Yesterday I finally understood that I can import settings from settings_local:# settings_local.pyfrom settings import \*INSTALLED_APPS += (# ...)

Feb 18, 2010Technology
Business logic in models

In my recent project there was a lot of data business logic, so I had to organize this code somehow. In this article I'll describe a few hints on how to it.