Kafka has a guaranteed delivery mechanism that is very easy to set up. When starting out with minimal hardware, it can handle very large data volumes. When prototyping and creating a proof of concept, Kafka has helped to speed up the timeline from the prototype all the way to production volumes.
Solutions Architect at a consultancy with 1,001-5,000 employees
Has the ability to write data at one velocity and have subscribing consumers read at different velocities.
Pros and Cons
- "Apache Kafka is actually a distributed commit log. That is different than most messaging and queuing systems before it."
- "The GUI tools for monitoring and support are still very basic and not very rich. There is no help in determining a shard key for performance."
How has it helped my organization?
What is most valuable?
Apache Kafka is actually a distributed commit log. That is different than most messaging and queuing systems before it. I find the ability to write data at one velocity and have subscribing consumers read at different velocities to be the best feature.
What needs improvement?
The GUI tools for monitoring and support are still very basic and not very rich. There is no help in determining a shard key for performance.
What do I think about the stability of the solution?
We did not have any issues with stability.
Buyer's Guide
Apache Kafka
January 2025
Learn what your peers think about Apache Kafka. Get advice and tips from experienced pros sharing their opinions. Updated: January 2025.
829,634 professionals have used our research since 2012.
What do I think about the scalability of the solution?
We did not have any issues with scalability.
How are customer service and support?
- Kafka is open source from LinkedIn and support comes from the community of users.
- You can go with Confluent, the company that was founded by the original engineers from LinkedIn.
- You can go with a cloud hosting service, like AWS EMR or Azure HDInsight.
Which solution did I use previously and why did I switch?
We used traditional message queues and file semaphores. There was a lot of overhead with asynchronous messages being put into an order and making sure nothing got dropped. It required a lot of code and maintenance.
How was the initial setup?
Since it is open source, you are on your own for setup. However, the tutorials from the Apache foundation and online sources have been an immense help.
Getting started is very easy. The complexity of very large volumes of data and appropriate sharding, however, is difficult. There are fewer resources for tuning and best practices.
What's my experience with pricing, setup cost, and licensing?
When starting to look at a distributed message system, look for a cloud solution first. It is an easier entry point than an on-premises hardware solution. A lot of the complexity has already been taken care of. Both AWS and Azure have supported Kafka clusters that can be provisioned very easily.
Which other solutions did I evaluate?
We looked at RabbitMQ and Spark Streaming.
What other advice do I have?
Be sure to define the use cases as best as possible at first.
Kafka is very good, but it is complex to support. It can handle any message size, whereas native cloud options have size limitations.
Be sure to understand what messages will be sent and how many discrete topics will be needed.
Be aware that you must code both producers and consumers.
The bulk of the work is with the consumer.
The Apache stack for Kafka is very open source. There are essentially no tools other than command line options to monitor brokers and topic health. So there are 3rd party tools that will help with that, some free, some paid – but it requires that you install agents on the servers hosting Kafka and open up ports for netbeans on the scripts that start up the Kafka services. Additionally, you also have to monitor zookeeper – which is very memory intensive. Cloud offerings that provide the whole modern data architecture stack – like AWS EMR and Azure HDInsight as well as Hortonworks and Cloudera provide a console GUI as part of each of their offerings. Also Confluent, a company founded by the Linked-In engineers that designed Kafka, also have a paid enterprise offering that has much better tools for maintain the kafka cluster. But apache Kafka with the community – you are on your own.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Software Engineer at a tech services company with 201-500 employees
It can manage a high volume of data from many sources
Pros and Cons
- "Kafka is scalable. It can manage a high volume of data from many sources."
- "The interface has room for improvement, and there is a steep learning curve for Hadoop integration. It was a struggle learning to send from Hadoop to Kafka. In future releases, I'd like to see improvements in ETL functionality and Hadoop integration."
What is our primary use case?
I use Kafka to send network packets from different sources to my cluster. We have around 10 users at my company.
What is most valuable?
Kafka is scalable. It can manage a high volume of data from many sources.
What needs improvement?
The interface has room for improvement, and there is a steep learning curve for Hadoop integration. It was a struggle learning to send from Hadoop to Kafka. In future releases, I'd like to see improvements in ETL functionality and Hadoop integration.
For how long have I used the solution?
I have used Kafka for around six months.
What do I think about the stability of the solution?
I rate Apache Kafka seven out of 10 for stability.
What do I think about the scalability of the solution?
I rate Kafka eight out of 10 for scalability.
How are customer service and support?
I rate Apache support six out of 10. It was hard to find the information I needed.
How would you rate customer service and support?
Neutral
Which solution did I use previously and why did I switch?
Before Kafka, I sent feeds directly to Hadoop.
How was the initial setup?
I initially found Kafka difficult to set up, so I would rate it about five out of 10 for ease of setup. After I learned more about the platform, I would rate it eight out of 10. It is deployed on-premises over a cluster of three or four PCs. You can deploy Kafka in a few hours with one person.
What's my experience with pricing, setup cost, and licensing?
Kafka is open source.
What other advice do I have?
I rate Apache Kafka eight out of 10. I would recommend it to others.
Which deployment model are you using for this solution?
On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Apache Kafka
January 2025
Learn what your peers think about Apache Kafka. Get advice and tips from experienced pros sharing their opinions. Updated: January 2025.
829,634 professionals have used our research since 2012.
Architecte Technique Senior at a computer software company with 10,001+ employees
Good, clear documentation but growth needs to improve
Pros and Cons
- "The most valuable feature is the documentation, which is good and clear."
- "An area for improvement would be growth."
What is most valuable?
The most valuable feature is the documentation, which is good and clear.
What needs improvement?
An area for improvement would be growth.
For how long have I used the solution?
I've been using this solution for just over a year.
What do I think about the stability of the solution?
Kafka works very well.
How was the initial setup?
The initial setup was simple.
What other advice do I have?
I would rate this solution six out of ten.
Which deployment model are you using for this solution?
On-premises
Disclosure: My company has a business relationship with this vendor other than being a customer: Reseller
Project Engineer at Wipro Limited
Free to use, mature, and offers good scalability
Pros and Cons
- "It's an open-source product, which means it doesn't cost us anything to use it."
- "The UI is based on command line. It would be helpful if they could come up with a simpler user interface."
What is our primary use case?
We primarily use the solution for big data. We often get a million messages per second, and with such a high output we use Kafka to help us handle it.
What is most valuable?
When we're working with big data, we need a throughput computing panel, which is something that Kafka provides, and something we find extremely valuable. It helps us support computing and ensures there's no loss of data. It can even do replication with some data.
The delivery of data is it's most valuable aspect.
It's an easy to use product overall.
The solution is quite mature.
It's an open-source product, which means it doesn't cost us anything to use it.
What needs improvement?
We're still going through the solution. Right now, I can't suggest any features that might be missing. I don't see where there can be an improvement in that regard.
The speed isn't as fast as RabbitMQ, even though the solution touts itself as very quick. It could be faster. They should work to make it at least as fast as RabbitMQ.
The UI is based on command line. It would be helpful if they could come up with a simpler user interface.
They should make it easier to configure items on the solution.
The solution would benefit from the addition of better monitoring tools.
For how long have I used the solution?
I've been using the solution for six months.
What do I think about the stability of the solution?
The solution is a bit slow in comparison to RabbitMQ. It's supposed to be a very fast solution, and it has okay performance, but speed-wise, it's quite slow.
What do I think about the scalability of the solution?
The scaling of the solution is quite good.
How are customer service and technical support?
In terms of technical support, we don't get that directly from Apache Kafka. We have certain cloud data distribution so we get assistance from our cloud data support.
How was the initial setup?
We're continuously deploying the product. We're still in the process of deployment.
What's my experience with pricing, setup cost, and licensing?
It's an open-source product, so the pricing isn't an issue. It's free to use. We don't have costs associated with it.
Which other solutions did I evaluate?
I'm not the product owner, so I didn't have a say in what should be chosen. We were seeing a high throughput with Kafka which is why we ultimately chose it.
What other advice do I have?
I'd rate the solution eight out of ten. It's good at scaling, and, performance-wise, it's excellent. If they could add upon the UI and allow for easier configuration, I'd rate them higher.
Which deployment model are you using for this solution?
On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Technical Consultant at KPMG
It eases our current data flow and framework
Pros and Cons
- "It eases our current data flow and framework."
- "Kafka 2.0 has been released for over a month, and I wanted to try out the new features. However, the configuration is a little bit complicated: Kafka Broker, Kafka Manager, ZooKeeper Servers, etc."
What is our primary use case?
It's convenient and flexible for almost all kinds of data producers. We integrated it with Kafka Streams, which can perform some easy data processing, like summary, count, group, etc
How has it helped my organization?
It eases our current data flow and framework, which digests all types of sources regardless of it being structured or not.
What is most valuable?
- High availability
- High throughput
With such a large digest, I was genuinely impressed at the process being almost real-time.
What needs improvement?
Kafka 2.0 has been released for over a month, and I wanted to try out the new features. However, the configuration is a little bit complicated: Kafka Broker, Kafka Manager, ZooKeeper Servers, etc.
For how long have I used the solution?
Less than one year.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Senior Java Consultant at a tech services company with 501-1,000 employees
The product is a distributed system for persistent messaging
What is most valuable?
The most valuable features are performance, persistent messaging, and reliability. It allows us to persist the message for a configurable number of days, even after it has been delivered to the consumer. The message delivery is also fast.
How has it helped my organization?
We wanted to track the customer activities on our application and store those details on another system(RDBMS/Apache Hadoop). We do extensive analysis with that. This helps the company to analyze the customer activities, such as search terms, and do better.
What needs improvement?
It’s perfect for our requirements.
For how long have I used the solution?
I have been using Apache Kafka for two years.
What do I think about the stability of the solution?
We have had no issues with stability.
What do I think about the scalability of the solution?
We have had no issues with scalability.
How are customer service and technical support?
We use the open source one, so we did not opt for any technical support.
Which solution did I use previously and why did I switch?
We started to use Apache Kafka with our application from scratch.
How was the initial setup?
The initial setup was straightforward. We faced some issues during the development in areas such as message producer and consumer. We rectified those with the tweaking the producer and consumer configurations. The documentation is very good.
What's my experience with pricing, setup cost, and licensing?
I don’t have any idea, as we use the open source version.
What other advice do I have?
It's a high-performance distributed system. If you want to track the user activities or any stream processing, then this is perfect. We have used Docker Kafka for our implementation. It's very easy for setup and testing. You could also try the same.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Java Developer at a media company with 10,001+ employees
It provides safety for data in case of node failure or data center outage. Partitioning is useful for parallelizing processing.
What is most valuable?
The most valuable features to me are replication, partitioning and easy integration with Apache Spark, which we use quite a bit for distributed processing.
Replication is good for high availability. It provides additional safety for data in case of node failure or data center outage. Partitioning is a really useful feature for parallelizing processing. We use Apache Spark to process data from a Kafka queue, and Spark is able to assign one executor to each Kafka partition. The more partitions we have, the more threads we can use to process data in parallel. This helps us achieve really good throughput.
How has it helped my organization?
It will help us build a scalable platform. This will allow the company to provide better customer service.
What needs improvement?
It’s pretty easy to use for now. I haven’t had any difficulty or problems that I can complain about. Maybe they can add a UI to the configure queues and to display statistics about data stores.
For how long have I used the solution?
I have used Kafka for about a year.
What do I think about the stability of the solution?
So far, we have not encountered any stability issues.
What do I think about the scalability of the solution?
We have not had any scalability issues. The product is horizontally scalable, so adding extra hardware is all that is needed.
How are customer service and technical support?
We haven’t needed technical support with the product yet.
Which solution did I use previously and why did I switch?
I think performance-wise, the product is very good and fits in our use case. We used other distributed message queues, but all products have their own use case
How was the initial setup?
Initial setup wasn’t really complex. We use Kafka through Hortonworks Suite, which comes with many other big data tools. Ambari makes it easy to setup
What's my experience with pricing, setup cost, and licensing?
Licensing and pricing was handled by my management, so I don’t have much knowledge there.
What other advice do I have?
Give it a try. It’s a valuable, high-performance, distributed processing tool.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Developer Infrastructure at Outbrain
Very easy to install, stable, and has good scaling options
Pros and Cons
- "It's very easy to keep to install and it's pretty stable."
- "The third party is not very stable and sometimes you have problems with this component. There are some developments in newer versions and we're about to try them out, but I'm not sure if it closes the gap."
How has it helped my organization?
In my previous company, we had a proprietary implementation and we changed it with Kafka. We changed it because we had many different connectors available and it also allowed us to create a window to our products for the client. It was an on-premise product and it allowed the outline to take the data out, without us developing anything.
You can connect in any language and there are a lot of connectors available, it helps a lot. And it creates visibility into the data and stability. There are several alternatives but this is one of the best options for this.
What is most valuable?
It's very easy to install and it's pretty stable.
The possibility to have connectors is very helpful. Another valuable aspect is that it's mature and open-source.
From a scalability point of view, you just add servers and it's scalable. The whole architecture is very scalable.
What needs improvement?
There is a feature that we're currently using called MirrorMaker. We use it to combine the information from different Kafka servers into another server. It's very wide and it gives a very generic scenario. I think it would be great if the possibility would exist out of the box and not as a third party. The third party is not very stable and sometimes you have problems with this component. There are some developments in newer versions and we're about to try them out, but I'm not sure if it closes the gap.
For how long have I used the solution?
I have been using this solution for six months. I also worked with it additionally in my previous company but not so intensively.
How are customer service and technical support?
I have never needed to use technical support. I know it's available but we haven't needed it because there's a lot of information on the internet that has helped us to solve our issues.
What other advice do I have?
I would definitely recommend Kafka. In our current position, we use it to move a lot of data and I think it's definitely working well. I would definitely recommend it.
I would rate it an eight out of ten.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Download our free Apache Kafka Report and get advice and tips from experienced pros
sharing their opinions.
Updated: January 2025
Product Categories
Streaming AnalyticsPopular Comparisons
PubSub+ Platform
Buyer's Guide
Download our free Apache Kafka Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:
- Which ETL tool would you recommend to populate data from OLTP to OLAP?
- What are the differences between Apache Kafka and IBM MQ?
- How do you select the right cloud ETL tool?
- What is the best streaming analytics tool?
- What are the benefits of streaming analytics tools?
- What features do you look for in a streaming analytics tool?
- When evaluating Streaming Analytics, what aspect do you think is the most important to look for?
- Why is Streaming Analytics important for companies?