What is our primary use case?
My company developed an anti-money laundering compliance platform specifically for the gaming industry. This multitenant platform utilizes Microsoft Azure Cosmos DB as its core operational database.
We have a high-throughput, large-volume data ingestion process, with substantial data flowing into our environment. We needed to solve two problems: ultra-high volume, requiring speedy reads and writes of hundreds of gigabytes of data per day, and the ability to distribute our platform geographically. Azure Cosmos DB's geo-replication features and the ability to host and scale our database across multiple regions, keeping data close to our customers, were primary deciding factors.
How has it helped my organization?
Azure Cosmos DB is quick to adopt with a shallow learning curve. The average user can be operational within hours or days, handling small to medium data volumes. However, optimizing for ultra-high throughput scenarios involves a steeper learning curve, requiring substantial knowledge to master Azure Cosmos DB. Nonetheless, most users can leverage it as their operational data store with minimal effort.
Our platform boasts several extensive language model features, particularly around summarization capabilities. We use vector searching in Azure Cosmos DB to facilitate the retrieval of an augmented generation model with our LLM implementation. It's a standard RAG implementation using Azure Cosmos DB. Compared to other options, a key advantage of vector indexing in Azure Cosmos DB is the ability to query documents alongside vectors. This pinpoints the precise information required for RAG in our LLM solution, granting us greater flexibility than vector searching in other Azure services.
We integrated the vector database with the Azure OpenAI service for our LLM solution.
The Azure AI services were simple to integrate with the vector database. There was a slight learning curve, especially as we were on the private preview of vector searching. This led to some hiccups with our existing database configurations, specifically regarding continuous backup. We couldn't enable continuous backup and vector searching simultaneously. However, this was solely due to our participation in the preview, and I'm confident this issue won't persist in the general availability release.
Azure Cosmos DB is fantastic for searching large amounts of data when the data is within a single partition. Over the last two weekends, we ingested over 400 gigabytes of data into our Azure Cosmos DB database and saw no change in querying performance compared to when our database was only 20 gigabytes in size. This is impressive and powerful, but the scope is limited to those partition queries.
The first benefit we've seen is increased developer productivity. Azure Cosmos DB is an easy database to work with. Its schema-less nature allows us to iterate quickly on our platform, develop new features, and store the associated data in Azure. Developers find it easy to use, eliminating the need for object-relational mapping tools and other overhead. Geographic replication and the ability to scale geographically is another advantage. This is challenging with other databases, even other NoSQL databases, but Azure Cosmos DB makes it easy. Cost optimization is a major benefit as well. We've been able to run our platform at a fraction of the infrastructure cost our customers incur when integrating with us. This allows us to focus resources on feature development and platform building rather than infrastructure maintenance.
Azure Cosmos DB helped reduce the total cost of ownership. We don't need DBAs, system administrators, or typical IT staff to run the infrastructure because we can use Azure Cosmos DB as a platform or a software-as-a-service data storage solution. This makes the total cost of ownership significantly lower than any comparable solution using relational databases or other NoSQL solutions like MongoDB.
We enable auto-scaling on all of our Azure Cosmos DB resources, which helps us achieve cost optimizations.
What is most valuable?
The querying language and the SDKs they've provided over the years have been phenomenal, giving us a significant advantage. Being a NoSQL database with a schema-less design allows us to optimize costs and reduce the infrastructure we need to manage. While we don't utilize every feature, auto-scaling has been invaluable for optimizing both cost and performance on our platform daily.
What needs improvement?
Azure Cosmos DB could be better for business intelligence and analytical queries. While it excels at high-throughput data ingestion and point reads with low latency, querying within partitions is smooth. Complex cross-partition querying, and BI/analytical tasks often necessitate moving data to other solutions like Fabric and Azure AI Search.
For how long have I used the solution?
I have been using Microsoft Azure Cosmos DB for nine years.
What do I think about the stability of the solution?
Over the four years we have been building, Azure Cosmos DB has not caused us one second of downtime.
The latency and availability of the Azure Cosmos DB is excellent. We have single millisecond latency point reads, which is the majority of the queries that we run on our platform. So, it's an ultra-high performance, ultra-high throughput platform for managing data.
What do I think about the scalability of the solution?
We are confident in Azure Cosmos DB's ability to scale and meet our needs even with massive data volumes. We ingest hundreds of gigabytes of data into Azure Cosmos DB daily, reinforcing our confidence in its scaling capabilities.
How are customer service and support?
Our experience with technical support has always been great. We've been lucky enough to connect with the product team resources at Microsoft around Azure Cosmos DB, and we've received fantastic support from the Azure Cosmos DB team.
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
I've used MongoDB extensively, both self-hosted and through their SaaS solution, Mongo Atlas. I've also worked with relational databases like PostgreSQL, SQL Server, and MySQL. Additionally, I've experimented with non-document data stores like Cassandra, but not in a production environment. All the other databases have been used in deployed production applications.
Azure Cosmos DB offers a minimal total cost of ownership. No database administration or dedicated teams are needed to manage it, which increases developer productivity compared to relational database solutions. Azure Cosmos DB also provides higher levels of data consistency without the traditional data migration problems of relational databases, and its schema-less nature makes versioning easier. However, there is a steep learning curve for building large-scale applications and optimizing throughput with Azure Cosmos DB. Additionally, its analytical capabilities are not as strong as other solutions.
How was the initial setup?
The deployment was straightforward to initiate. While numerous configuration items are needed for production workloads, particularly key management, getting started remains simple and accessible.
We deployed our infrastructure as code, initially using Azure CLI scripts to deploy AzureCosmos DB alongside other infrastructure components. We have since transitioned to Terraform for this purpose. From an implementation strategy perspective, we are a multi-tenant platform using logical tenancy, meaning a single Azure Cosmos DB database accommodates multiple tenants. We have a relatively large number of collections, approximately 40, within our Azure Cosmos DB database. To optimize cost, we share throughput where feasible and provide dedicated throughput for containers with high read or write volumes.
One person is enough for the deployment.
What's my experience with pricing, setup cost, and licensing?
For the first three years of our company, we were able to run a production environment while spending less than $10,000 a month on our database. In contrast, our customers pay tens of thousands of dollars for the systems we integrate. Therefore, Azure Cosmos DB is a highly cost-optimized solution when used correctly.
What other advice do I have?
I rate Microsoft Azure Cosmos DB ten out of ten.
We use Azure Cosmos DB extensively for searching alongside Azure AI Search, which offers full-text Lucene syntax-compatible querying. While a significant portion of our searches leverage these dedicated search indexes, we still conduct a fair amount directly in Azure Cosmos DB. Although it might not be entirely fair to say that searching isn't Azure Cosmos DB's strong suit, it's worth noting that its capabilities are constrained by partitioning requirements. This limitation places a ceiling on its overall effectiveness for specific scenarios. While Azure Cosmos DB can be extremely valuable for querying within partitions, alternative solutions are often better suited for queries spanning multiple partitions.
I've built tools around the Azure Cosmos DB SDKs to make them incredibly easy to use. My team had no learning curve and could leverage our shared libraries. It took me less than a week to achieve a production-quality implementation for accessing and saving data within a platform.
We have 20 people in the organization who interact with Azure Cosmos DB, consisting of 15 engineers and five others.
Azure Cosmos DB typically requires minimal maintenance, but if data partitioning is not done correctly, some overhead may be incurred due to the need to replicate containers and move data. Thus, while generally low maintenance, some maintenance can be required in certain situations.
For anyone thinking about implementing Azure Cosmos DB, first, understand your data and invest time in understanding the partitioning in Azure Cosmos DB. If you get your head wrapped around the partitioning, everything else will be straightforward.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Microsoft Azure
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.