What is our primary use case?
I used it in my last organization. We were creating a full-stack web application and used Microsoft Azure Cosmos DB to store user credentials and most of the transactional data, as well as user chats. We did many PoCs for the vector embedding of files for critical things.
We used the built-in vector database capabilities in Microsoft Azure Cosmos DB; we conducted different PoCs around that and tested many beta features. We tried them, and there were obviously hiccups because they were in the beta phase. The additional support provided was sufficient to help us with our PoCs.
RAG was something we wanted to deep dive into. We were trying to get a few machine learning models to run from the Kubernetes side. We wanted to take the data from our own database and then vectorize it and RAG over it so that we could have Q&A directly for what we wanted to do.
How has it helped my organization?
We built an application internally for taking official documentation present on any publicly accessible website, chunking it, and vectorizing the data into vector embeddings. We used it to have Q&A so that we didn't need to go over much official documentation. That was the internal use of it, which helped significantly. We followed the guides present in the Azure official documentation and their YouTube channels. Operationally, it helped with efficiency. We doubled our productivity with this small application. When building something, if we didn't know about the technology, we typically searched the internet or ChatGPT, but with the application, we didn't have to follow the older practices of going to the official documentation, reading, understanding, and getting snippets there. With vector embeddings and RAG built over it, we could also optimize feedback from customers that guided our future enhancement, whether to build new features, enhance existing ones, or remove features that weren't beneficial.
Using Microsoft Azure Cosmos DB improved our organization's search result quality significantly. While running queries during the test phase, we were able to configure which particular dataset required fewer RUs and which required higher RUs. This way, when handing off the end product to customers, we ensured that only databases needing higher throughput would get more RUs. It positively impacted the costs. It helped us lower the overall cost of the database, dropping from 33% to 22%, reflecting an 11% decrease in the latest quarter.
What is most valuable?
The best part of Microsoft Azure Cosmos DB is that with the default configuration and the Azure functional pipeline, if your go-to cloud provider is Microsoft Azure, the whole integration is seamless. Doing it by SDK or any other way, through a POST request or HTTP request, is easy, and that is documented, so that is a plus point.
Apart from that, the NoSQL database with SQL query support is a significant advantage. You can have both semi-structured and structured data stored in JSON and then have SQL queries run over it, which can be more advantageous compared to other providers.
What needs improvement?
The topic of RU consumption needs better documentation.
Now that Microsoft has partnered with different LLM organizations, such as OpenAI, a bot could guide us through different metrics present in Microsoft Azure Cosmos DB. For enhanced productivity, it would be better to add information about the new features to the Microsoft Azure Cosmos DB admin dashboard itself. We usually have to rely on YouTube tutorials or the official documentation.
Furthermore, while it is supported regionally, I did experience a rare case during our working time where it went down on their end and showed faulty previous data. Better error handling would be beneficial. We had to go to forums to check if it was failing for everyone else. It was surprising that a large organization like Microsoft doesn't provide an official statement about the maintenance or issues that could impact our overall usage.
How are customer service and support?
I would rate the customer support of Microsoft Azure Cosmos DB a seven out of ten. The reason for deducting three points is that when you raise a support request, you don't know who will respond. Sometimes, the assistance is very helpful and effective, while other times, it might not meet expectations.
How would you rate customer service and support?
How was the initial setup?
It didn't take much time. We had a meeting for deploying certain elements, along with two environments for development and production, and completed cost estimations in one to two days. It took us about one to two weeks to spin up everything. We didn't only create Microsoft Azure Cosmos DB; we also migrated our data from the existing dataset to the new one. It took about a week. We were a small company starting up, so we didn't have that much data. If this involved a larger company, it would have taken one to two months of effort.
Initially, using Microsoft Azure Cosmos DB was uphill because we were just beginners, but it then got easy, and I was enjoying my ride. It was seamless; there was support for different language stacks. From that perspective, it was easy. We didn't need many tutorials or helper guides for it. We just read the official documentation, which made it easy to get hold of it.
The learning curve for Microsoft Azure Cosmos DB is straight; it's not steep. I didn't have extensive prior knowledge, but I followed the official documentation and a Kubernetes course recommended by a senior. After a few days of completing that course and reviewing a few documents, I was up and running.
What about the implementation team?
Initially, our environment size had about three developers, which scaled up to four or five. Eventually, it included non-developers and an ML team. We were a small organization, so it never scaled over 10 developers, and including clients, it never went over 30.
What was our ROI?
Microsoft Azure Cosmos DB helped decrease the total cost of ownership. When I joined the organization, we were shifting from AWS to Azure. We were part of the Microsoft for Startup Founders Hub and had credits from their end. While trying to establish multiple PoCs based on our investors' suggestions and our client's recommendations, we aimed to have a data warehouse for clients' data for better future project developments and for enhancing current offerings or eradicating features from the current stack.
That helped with cost estimation for the overall project and different features we gave, such as the image generation feature, which was one of the main client demands. We spun up an image generation model in Azure Machine Learning Studio, connected its data to Microsoft Azure Cosmos DB via a pipeline. The costs spiked for us, so we added a register cache on the frontend, and in the backend, we created a workaround to directly store the most searched or most recently created images into BLOB storage linked to Microsoft Azure Cosmos DB. This allowed faster access compared to re-generating through the entire pipeline, which also contributed to reducing our costs.
What's my experience with pricing, setup cost, and licensing?
If you are a small organization or startup building from scratch without the Microsoft Startup Founder Club support, it could be expensive. However, if you have the budget and your use case leans more towards AI, Microsoft Azure is leading in AI integration compared to other cloud service providers, giving you an edge. If it's about the latest AI, especially LLM RAG, which often involves vector embeddings, Microsoft Azure Cosmos DB can handle that.
For mid-tier organizations that have thoroughly analyzed the data migration costs and potential new charges, Microsoft Azure Cosmos DB could be a viable option. For top-tier organizations, it's a better route to go through Azure itself.
What other advice do I have?
It handles semi-structured data and unstructured data efficiently, which worked for us because we dealt with images, videos, and other multimedia formats that couldn't be structured properly. However, there was some uncertainty with increasing the RUs and other elements, which complicated things because when you increase the RU and limit it to say 800 or 1,000, even though you are not reaching that limit, you're still paying for it, which is a disadvantage for a startup. You're burning money for that.
We didn't have huge amounts of data to assess in Microsoft Azure Cosmos DB, but it was efficient. Its efficiency also depends on how you've configured it.
Overall, I would rate Microsoft Azure Cosmos DB an eight out of ten.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.