What is our primary use case?
It's basically four bands of use cases, where we publish data on Kafka topics and stream it across microservices.
How has it helped my organization?
Some of the retention policies of Kafka topics have been most beneficial for data management specifically.
Based on our experience, there are different use cases where data needs to be handled in different ways. Sometimes we want to get rid of it once it has been consumed, or we have to store it for a longer period.
Kafka provides handy properties that allow us to directly configure the data, whether to keep it or discard it after use.
What is most valuable?
I feel the streaming speed, the way messages are processed, and some of the topic features like partitions and offset management are quite handy.
What needs improvement?
There's one thing that's a common use case, but I don't know why it's not covered in Kafka. When a message comes in, and another message with the same key arrives, the first version should be deleted automatically.
We want to keep only one instance of a message at any given time, the latest one. However, Kafka doesn't have this functionality built-in. It keeps all the data, and we have to manually delete the older versions.
So, I would like to have only one instance of messages, based on the keys. If the key is the same, there should always be the latest message present instead of all versions of that message.
For how long have I used the solution?
I have been using it for three years. I use the latest version. I work with v3.6.
What do I think about the stability of the solution?
I would rate the stability a six out of ten. When it's good, it works fine. But as soon as traffic increases and the number of topics on the cluster exceeds a certain limit, it becomes unresponsive.
We then have to get rid of Kafka topics, but even that's not easy because the whole site becomes unresponsive. We don't have easy dashboard access to remove unnecessary topics. That's one issue.
What do I think about the scalability of the solution?
I would give it an eight out of ten for scalability. It's scalable, but there's room for improvement in reliability. When we scale up at the pod level, reliability goes down due to mismanagement of offsets, leading to data loss. Then, there is mismanagement when we scale it up, and then there is a point where we want to scale down because traffic is less.
During scale-down, we also often see data loss. They can work on improving this.
We currently have large enterprise business as our customers.
How was the initial setup?
I would rate my experience with the initial setup of this product, a seven out of ten where one is difficult and ten is easy.
I have not faced any difficulties or challenges while setting this product up. They have proper documentation, so it's easy to go through it and set things up.
It's the cloud solution, so it's deployed on the cloud in our customers' organizations. And they use Confluent Cloud.
What about the implementation team?
It's taken care of by different teams.
What other advice do I have?
Overall, I would rate it an eight out of ten.
I would recommend it because everything is well-documented and straightforward.
We can install Kafka directly, but we don't have direct access to the data on Kafka. So it's good to have tools like Kafka Magic or Kafka Tool to access and visualize the data.
Which deployment model are you using for this solution?
Public Cloud
*Disclosure: My company has a business relationship with this vendor other than being a customer: consultant