I use Kafka to send network packets from different sources to my cluster. We have around 10 users at my company.
Software Engineer at a tech services company with 201-500 employees
It can manage a high volume of data from many sources
Pros and Cons
- "Kafka is scalable. It can manage a high volume of data from many sources."
- "The interface has room for improvement, and there is a steep learning curve for Hadoop integration. It was a struggle learning to send from Hadoop to Kafka. In future releases, I'd like to see improvements in ETL functionality and Hadoop integration."
What is our primary use case?
What is most valuable?
Kafka is scalable. It can manage a high volume of data from many sources.
What needs improvement?
The interface has room for improvement, and there is a steep learning curve for Hadoop integration. It was a struggle learning to send from Hadoop to Kafka. In future releases, I'd like to see improvements in ETL functionality and Hadoop integration.
For how long have I used the solution?
I have used Kafka for around six months.
Buyer's Guide
Apache Kafka
November 2024
Learn what your peers think about Apache Kafka. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
817,354 professionals have used our research since 2012.
What do I think about the stability of the solution?
I rate Apache Kafka seven out of 10 for stability.
What do I think about the scalability of the solution?
I rate Kafka eight out of 10 for scalability.
How are customer service and support?
I rate Apache support six out of 10. It was hard to find the information I needed.
How would you rate customer service and support?
Neutral
Which solution did I use previously and why did I switch?
Before Kafka, I sent feeds directly to Hadoop.
How was the initial setup?
I initially found Kafka difficult to set up, so I would rate it about five out of 10 for ease of setup. After I learned more about the platform, I would rate it eight out of 10. It is deployed on-premises over a cluster of three or four PCs. You can deploy Kafka in a few hours with one person.
What's my experience with pricing, setup cost, and licensing?
Kafka is open source.
What other advice do I have?
I rate Apache Kafka eight out of 10. I would recommend it to others.
Which deployment model are you using for this solution?
On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Architecte Technique Senior at a computer software company with 10,001+ employees
Good, clear documentation but growth needs to improve
Pros and Cons
- "The most valuable feature is the documentation, which is good and clear."
- "An area for improvement would be growth."
What is most valuable?
The most valuable feature is the documentation, which is good and clear.
What needs improvement?
An area for improvement would be growth.
For how long have I used the solution?
I've been using this solution for just over a year.
What do I think about the stability of the solution?
Kafka works very well.
How was the initial setup?
The initial setup was simple.
What other advice do I have?
I would rate this solution six out of ten.
Which deployment model are you using for this solution?
On-premises
Disclosure: My company has a business relationship with this vendor other than being a customer: Reseller
Buyer's Guide
Apache Kafka
November 2024
Learn what your peers think about Apache Kafka. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
817,354 professionals have used our research since 2012.
Project Engineer at Wipro Limited
Free to use, mature, and offers good scalability
Pros and Cons
- "It's an open-source product, which means it doesn't cost us anything to use it."
- "The UI is based on command line. It would be helpful if they could come up with a simpler user interface."
What is our primary use case?
We primarily use the solution for big data. We often get a million messages per second, and with such a high output we use Kafka to help us handle it.
What is most valuable?
When we're working with big data, we need a throughput computing panel, which is something that Kafka provides, and something we find extremely valuable. It helps us support computing and ensures there's no loss of data. It can even do replication with some data.
The delivery of data is it's most valuable aspect.
It's an easy to use product overall.
The solution is quite mature.
It's an open-source product, which means it doesn't cost us anything to use it.
What needs improvement?
We're still going through the solution. Right now, I can't suggest any features that might be missing. I don't see where there can be an improvement in that regard.
The speed isn't as fast as RabbitMQ, even though the solution touts itself as very quick. It could be faster. They should work to make it at least as fast as RabbitMQ.
The UI is based on command line. It would be helpful if they could come up with a simpler user interface.
They should make it easier to configure items on the solution.
The solution would benefit from the addition of better monitoring tools.
For how long have I used the solution?
I've been using the solution for six months.
What do I think about the stability of the solution?
The solution is a bit slow in comparison to RabbitMQ. It's supposed to be a very fast solution, and it has okay performance, but speed-wise, it's quite slow.
What do I think about the scalability of the solution?
The scaling of the solution is quite good.
How are customer service and technical support?
In terms of technical support, we don't get that directly from Apache Kafka. We have certain cloud data distribution so we get assistance from our cloud data support.
How was the initial setup?
We're continuously deploying the product. We're still in the process of deployment.
What's my experience with pricing, setup cost, and licensing?
It's an open-source product, so the pricing isn't an issue. It's free to use. We don't have costs associated with it.
Which other solutions did I evaluate?
I'm not the product owner, so I didn't have a say in what should be chosen. We were seeing a high throughput with Kafka which is why we ultimately chose it.
What other advice do I have?
I'd rate the solution eight out of ten. It's good at scaling, and, performance-wise, it's excellent. If they could add upon the UI and allow for easier configuration, I'd rate them higher.
Which deployment model are you using for this solution?
On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Technical Consultant at KPMG
It eases our current data flow and framework
Pros and Cons
- "It eases our current data flow and framework."
- "Kafka 2.0 has been released for over a month, and I wanted to try out the new features. However, the configuration is a little bit complicated: Kafka Broker, Kafka Manager, ZooKeeper Servers, etc."
What is our primary use case?
It's convenient and flexible for almost all kinds of data producers. We integrated it with Kafka Streams, which can perform some easy data processing, like summary, count, group, etc
How has it helped my organization?
It eases our current data flow and framework, which digests all types of sources regardless of it being structured or not.
What is most valuable?
- High availability
- High throughput
With such a large digest, I was genuinely impressed at the process being almost real-time.
What needs improvement?
Kafka 2.0 has been released for over a month, and I wanted to try out the new features. However, the configuration is a little bit complicated: Kafka Broker, Kafka Manager, ZooKeeper Servers, etc.
For how long have I used the solution?
Less than one year.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Senior Java Consultant at a tech services company with 501-1,000 employees
The product is a distributed system for persistent messaging
What is most valuable?
The most valuable features are performance, persistent messaging, and reliability. It allows us to persist the message for a configurable number of days, even after it has been delivered to the consumer. The message delivery is also fast.
How has it helped my organization?
We wanted to track the customer activities on our application and store those details on another system(RDBMS/Apache Hadoop). We do extensive analysis with that. This helps the company to analyze the customer activities, such as search terms, and do better.
What needs improvement?
It’s perfect for our requirements.
For how long have I used the solution?
I have been using Apache Kafka for two years.
What do I think about the stability of the solution?
We have had no issues with stability.
What do I think about the scalability of the solution?
We have had no issues with scalability.
How are customer service and technical support?
We use the open source one, so we did not opt for any technical support.
Which solution did I use previously and why did I switch?
We started to use Apache Kafka with our application from scratch.
How was the initial setup?
The initial setup was straightforward. We faced some issues during the development in areas such as message producer and consumer. We rectified those with the tweaking the producer and consumer configurations. The documentation is very good.
What's my experience with pricing, setup cost, and licensing?
I don’t have any idea, as we use the open source version.
What other advice do I have?
It's a high-performance distributed system. If you want to track the user activities or any stream processing, then this is perfect. We have used Docker Kafka for our implementation. It's very easy for setup and testing. You could also try the same.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Java Developer at a media company with 10,001+ employees
It provides safety for data in case of node failure or data center outage. Partitioning is useful for parallelizing processing.
What is most valuable?
The most valuable features to me are replication, partitioning and easy integration with Apache Spark, which we use quite a bit for distributed processing.
Replication is good for high availability. It provides additional safety for data in case of node failure or data center outage. Partitioning is a really useful feature for parallelizing processing. We use Apache Spark to process data from a Kafka queue, and Spark is able to assign one executor to each Kafka partition. The more partitions we have, the more threads we can use to process data in parallel. This helps us achieve really good throughput.
How has it helped my organization?
It will help us build a scalable platform. This will allow the company to provide better customer service.
What needs improvement?
It’s pretty easy to use for now. I haven’t had any difficulty or problems that I can complain about. Maybe they can add a UI to the configure queues and to display statistics about data stores.
For how long have I used the solution?
I have used Kafka for about a year.
What do I think about the stability of the solution?
So far, we have not encountered any stability issues.
What do I think about the scalability of the solution?
We have not had any scalability issues. The product is horizontally scalable, so adding extra hardware is all that is needed.
How are customer service and technical support?
We haven’t needed technical support with the product yet.
Which solution did I use previously and why did I switch?
I think performance-wise, the product is very good and fits in our use case. We used other distributed message queues, but all products have their own use case
How was the initial setup?
Initial setup wasn’t really complex. We use Kafka through Hortonworks Suite, which comes with many other big data tools. Ambari makes it easy to setup
What's my experience with pricing, setup cost, and licensing?
Licensing and pricing was handled by my management, so I don’t have much knowledge there.
What other advice do I have?
Give it a try. It’s a valuable, high-performance, distributed processing tool.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Developer Infrastructure at Outbrain
Very easy to install, stable, and has good scaling options
Pros and Cons
- "It's very easy to keep to install and it's pretty stable."
- "The third party is not very stable and sometimes you have problems with this component. There are some developments in newer versions and we're about to try them out, but I'm not sure if it closes the gap."
How has it helped my organization?
In my previous company, we had a proprietary implementation and we changed it with Kafka. We changed it because we had many different connectors available and it also allowed us to create a window to our products for the client. It was an on-premise product and it allowed the outline to take the data out, without us developing anything.
You can connect in any language and there are a lot of connectors available, it helps a lot. And it creates visibility into the data and stability. There are several alternatives but this is one of the best options for this.
What is most valuable?
It's very easy to install and it's pretty stable.
The possibility to have connectors is very helpful. Another valuable aspect is that it's mature and open-source.
From a scalability point of view, you just add servers and it's scalable. The whole architecture is very scalable.
What needs improvement?
There is a feature that we're currently using called MirrorMaker. We use it to combine the information from different Kafka servers into another server. It's very wide and it gives a very generic scenario. I think it would be great if the possibility would exist out of the box and not as a third party. The third party is not very stable and sometimes you have problems with this component. There are some developments in newer versions and we're about to try them out, but I'm not sure if it closes the gap.
For how long have I used the solution?
I have been using this solution for six months. I also worked with it additionally in my previous company but not so intensively.
How are customer service and technical support?
I have never needed to use technical support. I know it's available but we haven't needed it because there's a lot of information on the internet that has helped us to solve our issues.
What other advice do I have?
I would definitely recommend Kafka. In our current position, we use it to move a lot of data and I think it's definitely working well. I would definitely recommend it.
I would rate it an eight out of ten.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Senior Technical Architect at a computer software company with 51-200 employees
Its publisher-subscriber pattern has allowed our applications to access and consume data in real time.
Pros and Cons
- "I like the performance and reliability of Kafka. I needed a data streaming buffer that could handle thousands of messages per second with at least one processing point for an analytics pipeline. Kafka fits this requirement very well."
- "As an open-source project, Kafka is still fairly young and has not yet built out the stability and features that other open-source projects have acquired over the many years. If done correctly, Kafka can also take over the stream-processing space that technologies such as Apache Storm cover."
How has it helped my organization?
Through its publisher-subscriber pattern, Kafka has allowed our applications to access and consume data at a real time pace.
What is most valuable?
I like the performance and reliability of Kafka. I needed a data streaming buffer that could handle thousands of messages per second with at least one processing point for an analytics pipeline. Kafka fits this requirement very well, as it is a fast, distributed message broker. It definitely does exactly what it is designed to do.
What needs improvement?
As an open-source project, Kafka is still fairly young and has not yet built out the stability and features that other open-source projects have acquired over the many years. If done correctly, Kafka can also take over the stream-processing space that technologies such as Apache Storm cover.
Currently, as it is in the big/fast data integration world, you need to piece together many different open-source technologies. For example, to create a reliable, fault-tolerant streaming processing system that ingests data, you need:
- a producer service
- an event/message buffer such as Kafka or a message queue
- a stream processing consumer such as Spark, Flink, Storm, etc.
- something to help facilitate the ingestion into target datasources such as Flume or some customized concoction.
This is simply to ingest the data and does not necessarily account for the analytical pieces, which may consist of Spark ML, SystemML, ElasticSearch, Mahout, etc.
What I'm getting at is basically the need for a Spring framework of big data.
What do I think about the stability of the solution?
The only stability issues we had were mostly a result of the evolving APIs and existing bugs.
What do I think about the scalability of the solution?
Kafka is designed to be very easily scalable so I did not have any trouble here.
How are customer service and technical support?
We used the open-source version and did not buy support from Confluent.
Which solution did I use previously and why did I switch?
We did not have any other previous solutions. Our project was green field and a new type of project development.
How was the initial setup?
Initial setup was straightforward. We simply hosted multiple Kafka brokers and ZooKeeper servers on AWS EC2 instances.
What about the implementation team?
We implemented it in-house and then went with the Hortonworks Data Platform distribution.
Which other solutions did I evaluate?
We evaluated AWS Kinesis as well.
What other advice do I have?
Kafka is open source and requires an administrator to maintain the servers.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Download our free Apache Kafka Report and get advice and tips from experienced pros
sharing their opinions.
Updated: November 2024
Product Categories
Streaming AnalyticsPopular Comparisons
PubSub+ Platform
Buyer's Guide
Download our free Apache Kafka Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:
- Which ETL tool would you recommend to populate data from OLTP to OLAP?
- What are the differences between Apache Kafka and IBM MQ?
- How do you select the right cloud ETL tool?
- What is the best streaming analytics tool?
- What are the benefits of streaming analytics tools?
- What features do you look for in a streaming analytics tool?