I have a client looking for the skinny on MongoDB. Pros, Cons, best fit use cases, best practices. If anyone has hands on, objective experience would like to discuss. Also, other Big Data experience around document and key value pair stores as well.
Software Developer at Empresa de Telecomunicaciones de Cuba
User
2021-08-24T16:42:10Z
Aug 24, 2021
MongoDB is a very reliable and stable No-SQL engine. It's a document-based database, and use Javascript as its query language: depending on the developer expertise and skills, this could be an advantage as well as a disadvantage: SQL, as a query language, got too deep in developer's flesh and bones, so it could be hard to make the transition. Nevertheless, there are some IDEs that let developers write SQL for querying MongoDB. My advice: learn how to do it in Javascript: study, test, take some hours to deal with it, then success is achievable.
MongoDB is schema-free, and there's no referential integrity whatsoever, so it best fit in scenarios that deal with facts rather than transactions, although the last releases support transactions.
If your data consists mainly in facts and its volume is very high (TByte or more), then MongoDB could be your engine: you can replicate your data, adding redundancy to your persistence, improving HA; you can share your data, adding some criteria to divide your data, so queries can be directed to the dataset you need, using a meaningful shared key (maybe, this is one of the most important things to do when you're dealing with a cluster, and it's also crucial to do it right so your queries go to the shard that contains the data to be fetched the most probable; otherwise, the engine will try to find your data on all of the shards).
MongoDB has a lot of APIs to programmatically deal with it, including APIs for the most common programming languages: C/C++/C#, Java, Scala, Ruby, Kotlin, Python, Ruby, Go, etc. For any of those APIs, when you ask for a connection, it actually returns a pool of connections, so the API will handle connect/disconnect/reconnect policy for you.
MongoDB can be configured to handle a variety of scenarios, such as:
- critics at writings, then you just set a standalone MongoDB and turn off the op-log feature
- consistency at readings, even in a cluster, configuring features such as returning from API calls just when the writing process was checked in the primary node, and at least, in one secondary node, or in all secondaries
Other interesting features:
- nomenclators can persist in collections, and as long as those collections remain un-sharded, you can simulate a join operation using the Lookup method from the aggregation framework, which is very, very strong.
- You can set up a Map/Reduce process from its internal query language (Wow!, I mean, Wow!
- When you configure a cluster, any operation triggered from its internal query language will be executed in parallel on all of the nodes (again... Wow!)
- There are several types of indexes available. One curious thing: in multi-field indexes, the order you use to declare it is important at calling time. There are engines that besides that order, the existence of a field could be a valid reason for using that index. In MongoDB, if the field used as a filter is not the first in the index, MongoDB will trigger what I called a full-scan search.
- Date/time fields always persist in UTC time
And etc., etc., etc.
MongoDB best fits in non-transactional scenarios, when it's important to cross query facts over the time. It supports an extremely high volume of data, keeping it steady and solid. It's not a key-value engine: for such scenarios, Redis or Cassandra will fit better, in the case of querying over the network; in the case of embedding apps, Tokyo could be a nice choice for a key-value store.
Headquartered in New York, MongoDB's mission is to empower innovators to create, transform, and disrupt industries by unleashing the power of software and data. Built by developers, for developers, our developer data platform is a database with an integrated set of related services that allow development teams to address the growing requirements for today's wide variety of modern applications, all in a unified and consistent user experience. MongoDB has tens of thousands of customers in over...
MongoDB is a very reliable and stable No-SQL engine. It's a document-based database, and use Javascript as its query language: depending on the developer expertise and skills, this could be an advantage as well as a disadvantage: SQL, as a query language, got too deep in developer's flesh and bones, so it could be hard to make the transition. Nevertheless, there are some IDEs that let developers write SQL for querying MongoDB. My advice: learn how to do it in Javascript: study, test, take some hours to deal with it, then success is achievable.
MongoDB is schema-free, and there's no referential integrity whatsoever, so it best fit in scenarios that deal with facts rather than transactions, although the last releases support transactions.
If your data consists mainly in facts and its volume is very high (TByte or more), then MongoDB could be your engine: you can replicate your data, adding redundancy to your persistence, improving HA; you can share your data, adding some criteria to divide your data, so queries can be directed to the dataset you need, using a meaningful shared key (maybe, this is one of the most important things to do when you're dealing with a cluster, and it's also crucial to do it right so your queries go to the shard that contains the data to be fetched the most probable; otherwise, the engine will try to find your data on all of the shards).
MongoDB has a lot of APIs to programmatically deal with it, including APIs for the most common programming languages: C/C++/C#, Java, Scala, Ruby, Kotlin, Python, Ruby, Go, etc. For any of those APIs, when you ask for a connection, it actually returns a pool of connections, so the API will handle connect/disconnect/reconnect policy for you.
MongoDB can be configured to handle a variety of scenarios, such as:
- critics at writings, then you just set a standalone MongoDB and turn off the op-log feature
- consistency at readings, even in a cluster, configuring features such as returning from API calls just when the writing process was checked in the primary node, and at least, in one secondary node, or in all secondaries
Other interesting features:
- nomenclators can persist in collections, and as long as those collections remain un-sharded, you can simulate a join operation using the Lookup method from the aggregation framework, which is very, very strong.
- You can set up a Map/Reduce process from its internal query language (Wow!, I mean, Wow!
- When you configure a cluster, any operation triggered from its internal query language will be executed in parallel on all of the nodes (again... Wow!)
- There are several types of indexes available. One curious thing: in multi-field indexes, the order you use to declare it is important at calling time. There are engines that besides that order, the existence of a field could be a valid reason for using that index. In MongoDB, if the field used as a filter is not the first in the index, MongoDB will trigger what I called a full-scan search.
- Date/time fields always persist in UTC time
And etc., etc., etc.
MongoDB best fits in non-transactional scenarios, when it's important to cross query facts over the time. It supports an extremely high volume of data, keeping it steady and solid. It's not a key-value engine: for such scenarios, Redis or Cassandra will fit better, in the case of querying over the network; in the case of embedding apps, Tokyo could be a nice choice for a key-value store.
I hope this will be helpful.
Thanks for your time.