Try our new research platform with insights from 80,000+ expert users
Enterprise Architect at Smals vzw
Real User
Effective event sequencing, seamless system interactions, and beneficial data management
Pros and Cons
  • "There are numerous possibilities that can be explored. While it may be challenging to fully comprehend the potential advantages, one key aspect is the ability to establish a proper sequence of events rather than simply dealing with a jumbled group of occurrences. These events possess their own timestamps, even if they were not initially provided with one, and are arranged in a chronological order that allows for a clear understanding of the progression of the events."
  • "There have been some challenges with monitoring Apache Kafka, as there are currently only a few production-grade solutions available, which are all under enterprise license and therefore not easily accessible. The speaker has not had access to any of these solutions and has instead relied on tools, such as Dynatrace, which do not provide sufficient insight into the Apache Kafka system. While there are other tools available, they do not offer the same level of real-time data as enterprise solutions."

What is our primary use case?

Apache Kafka is used for more than only a messaging bus but also served as a database to store information. It functioned as a streamer, similar to ETL, to manipulate and transform events before migrating them to other systems for use. The database could also act as a cache. Apache Kafka is used as a database broker, streamer, and source of truth for multiple systems due to its ability to maintain events for at least 10 days. It provided both synchronous and asynchronous communication, making it a complex system that would be easier to understand through diagrams or sketches.

We use reactive frameworks.

How has it helped my organization?

From my experience with Apache Kafka, one of the most notable advantages is its ability to maintain a comprehensive record of historical data that includes every update, alteration, and version of information, unlike a conventional relational database. This feature allows for seamless tracking and analysis of the progression and transformation of the data over time, enabling users to easily review and analyze the history of the information.

The solution has the capability for various systems to effortlessly interact with one another without prior knowledge of their existence, current operational status, or specific configurations. By utilizing service buses and dynamic integration, data can be distributed across networks and retrieved in a way that is most suitable for each system's requirements. In addition, Apache Kafka allows for the modification of data to provide diverse clients, consumers, or observers with unique and varying data. The replication of data can produce multiple versions, and this data can be adjusted to fit various needs. With the use of probes, one can alter the behavior of the transformation process, thereby changing the way in which data is transformed and the output produced. Overall, working with Apache Kafka has brought about an array of benefits, enabling seamless system interactions and allowing for the customization and modification of data to meet individual requirements.

What is most valuable?

There are numerous possibilities that can be explored. While it may be challenging to fully comprehend the potential advantages, one key aspect is the ability to establish a proper sequence of events rather than simply dealing with a jumbled group of occurrences. These events possess their own timestamps, even if they were not initially provided with one, and are arranged in a chronological order that allows for a clear understanding of the progression of the events.

What needs improvement?

There have been some challenges with monitoring Apache Kafka, as there are currently only a few production-grade solutions available, which are all under enterprise license and therefore not easily accessible. The speaker has not had access to any of these solutions and has instead relied on tools, such as Dynatrace, which do not provide sufficient insight into the Apache Kafka system. While there are other tools available, they do not offer the same level of real-time data as enterprise solutions.

One additional area that I think could benefit from improvement is the deployment process on OpenShift. This particular deployment is quite challenging and requires the activation of certain security measures as well as integration with other systems. It's not a straightforward process and typically requires engineers who are highly skilled and have extensive experience with Apache Kafka to carry out these tasks. Therefore, I believe that there is a need for progress in this area, and some tools that can provide information, assistance, and help make the whole process easier would be greatly appreciated.

Buyer's Guide
Apache Kafka
February 2025
Learn what your peers think about Apache Kafka. Get advice and tips from experienced pros sharing their opinions. Updated: February 2025.
838,713 professionals have used our research since 2012.

For how long have I used the solution?

I have been using Apache Kafka for approximately four years.

What do I think about the stability of the solution?

The solution is stable if you have set it up correctly.

What do I think about the scalability of the solution?

Apache Kafka is a scalable solution.

How are customer service and support?

I have not escalated any questions to technical support because Apache Kafka is an open-source system. However, Confluent and other companies sell support and enterprise solutions to make it more convenient and streamline the work. They offer tools, such as a monitoring tool with a visual interface, which provides a lot of information and buttons to press for correction or change without touching the code. Each of those buttons hypothetically could have helped the situation, but it is unclear what they do exactly, it is best to call the data center and ask. If you buy their service, you have access to all the enterprise comforts.

How was the initial setup?

Setting up Apache Kafka is, is not an easy task, especially when trying to containerize it and make it controllable. This is because Apache Kafka has its own distributed mechanism for staying alive, checking readiness, replicating, and scaling. Ensuring that it complies with Kubernetes or OpenShift Orchestrator requires careful attention, as there is a risk of two masters attempting to perform the same task and ultimately undoing each other's work.

In comparison to Kubernetes, OpenShift is a highly skilled and advanced implementation infrastructure that automatically manages and orchestrates all the steps required for an application setup. It operates at a higher level of abstraction and eliminates the need for manual operations that are required with Kubernetes. While Kubernetes can run an application with some pipeline and configuration, OpenShift takes care of everything from finding the required images to creating ports and connecting databases. Although manual changes can be made, it's not necessary as OpenShift offers a much more course-grained management approach.

What about the implementation team?

One skillful DevOps engineer can implement the solution.

What's my experience with pricing, setup cost, and licensing?

Apache Kafka is an open-source solution.

What other advice do I have?

The maintenance of Apache Kafka is crucial due to the complexity of the system with numerous microservices and systems communicating through Apache Kafka, requiring proper integration and configuration to prevent overloading and ensure a healthy cluster. The task is not easy and requires knowledge of the various adjustable parameters, as misadjusting even one of them can greatly slow down the cluster. For example, if the consumer group changes frequently, the messages must be regrouped and reassigned, causing significant delays. Therefore, configuring Apache Kafka correctly is essential to avoid high latency issues.

I would strongly suggest others give Apache Kafka a chance and explore the various advantages that it can offer, especially since it should not be perceived as a message bus or broker but rather an enterprise bus designed for data manipulation. It has the ability to transform data, store and reject it, and even maintain different versions of the same data simultaneously. Moreover, it operates on a pull mechanism rather than a push mechanism, which takes away the risk of losing data and places the responsibility for data loss on the consumer. On the other hand, it also ensures that the data is always available within the specified window and allows for easy replication of the past, which is extremely helpful in situations such as those involving a hacked bank database. With Apache Kafka, you can efficiently go back in time, obtain the required status and events, and make changes accordingly, without the need to go through each transaction separately. Thus, using this solution can make data management much more efficient and convenient.

I rate Apache Kafka an eight out of ten.

In order to improve its user-friendliness, engineer-friendliness, and DevOps-friendliness, the system must undertake various tasks, such as enhancing the overall operation and configuration, ensuring seamless integration with other systems, and adapting to security layers in a more comprehensive and generic manner. This will require significant efforts to make the system more functional, secure, and efficient.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Moussa Chikhi - PeerSpot reviewer
Architecte Technique Senior at a computer software company with 10,001+ employees
Vendor
Good, clear documentation but growth needs to improve
Pros and Cons
  • "The most valuable feature is the documentation, which is good and clear."
  • "An area for improvement would be growth."

What is most valuable?

The most valuable feature is the documentation, which is good and clear.

What needs improvement?

An area for improvement would be growth.

For how long have I used the solution?

I've been using this solution for just over a year.

What do I think about the stability of the solution?

Kafka works very well.

How was the initial setup?

The initial setup was simple.

What other advice do I have?

I would rate this solution six out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: My company has a business relationship with this vendor other than being a customer: Reseller
PeerSpot user
Buyer's Guide
Apache Kafka
February 2025
Learn what your peers think about Apache Kafka. Get advice and tips from experienced pros sharing their opinions. Updated: February 2025.
838,713 professionals have used our research since 2012.
Abdul-Samad - PeerSpot reviewer
Software Engineer at a tech services company with 201-500 employees
Real User
It can manage a high volume of data from many sources
Pros and Cons
  • "Kafka is scalable. It can manage a high volume of data from many sources."
  • "The interface has room for improvement, and there is a steep learning curve for Hadoop integration. It was a struggle learning to send from Hadoop to Kafka. In future releases, I'd like to see improvements in ETL functionality and Hadoop integration."

What is our primary use case?

I use Kafka to send network packets from different sources to my cluster. We have around 10 users at my company.

What is most valuable?

Kafka is scalable. It can manage a high volume of data from many sources.

What needs improvement?

The interface has room for improvement, and there is a steep learning curve for Hadoop integration. It was a struggle learning to send from Hadoop to Kafka. In future releases, I'd like to see improvements in ETL functionality and Hadoop integration. 

For how long have I used the solution?

I have used Kafka for around six months.

What do I think about the stability of the solution?

I rate Apache Kafka seven out of 10 for stability. 

What do I think about the scalability of the solution?

I rate Kafka eight out of 10 for scalability. 

How are customer service and support?

I rate Apache support six out of 10. It was hard to find the information I needed. 

How would you rate customer service and support?

Neutral

Which solution did I use previously and why did I switch?

Before Kafka, I sent feeds directly to Hadoop.

How was the initial setup?

I initially found Kafka difficult to set up, so I would rate it about five out of 10 for ease of setup. After I learned more about the platform, I would rate it eight out of 10. It is deployed on-premises over a cluster of three or four PCs. You can deploy Kafka in a few hours with one person. 

What's my experience with pricing, setup cost, and licensing?

Kafka is open source. 

What other advice do I have?

I rate Apache Kafka eight out of 10. I would recommend it to others. 

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Sr Technical Consultant at a tech services company with 1,001-5,000 employees
Real User
Effective stream API, useful consumer groups, and highly scalable
Pros and Cons
  • "The most valuable features are the stream API, consumer groups, and the way that the scaling takes place."
  • "would like to see real-time event-based consumption of messages rather than the traditional way through a loop. The traditional messaging system works by listing and looping with a small wait to check to see what the messages are. A push system is where you have something that is ready to receive a message and when the message comes in and hits the partition, it goes straight to the consumer versus the consumer having to pull. I believe this consumer approach is something they are working on and may come in an upcoming release. However, that is message consumption versus message listening."

What is our primary use case?

One of our clients needed to take events out of SAP to stream them through Apache Kafka while applying data enrichment before reaching the consumers.

How has it helped my organization?

The solution can handle more speed and has horizontal scalability for both messaging, but more specifically stream processing and data enrichment. By using this solution it can reduce the number of components required in the tech stack. For example, we were taking data events out of SAP and sending them to consumers without having to go through multiple processors that were outside of the KAFKA space. Additionally, we are using Kafka from GoldenGate to propagate database updates in real-time.

What is most valuable?

The most valuable features are the stream API, consumer groups, and the way that the scaling takes place. 

What needs improvement?

I would like to see real-time event-based consumption of messages rather than the traditional way through a loop. The traditional messaging system works by listing and looping with a small wait to check to see what the messages are. A push system is where you have something that is ready to receive a message and when the message comes in and hits the partition, it goes straight to the consumer versus the consumer having to pull. I believe this consumer approach is something they are working on and may come in an upcoming release. However, that is message consumption versus message listening.

Confluent created the KSQL language, but they gave it to the open-source community. I would like to see KSQL be able to be used on raw data versus structured and semi-structured data.

For how long have I used the solution?

I have been using this solution for approximately one year.

What do I think about the stability of the solution?

The solution is stable.

What do I think about the scalability of the solution?

I have found the Apache Kafka to be highly scalable

How are customer service and technical support?

The project we were working on was open-source, we were using Confluent as support and they were great.

How was the initial setup?

Apache Kafka on AWS is a bit complex. There is a third-party company called Confluent and they have the support that makes their installation much easier, especially for the on-premise deployment. You install Apache Kafka alone it can be a little complex compared to other queuing messaging solutions.

The on-premise deployment takes approximately a few days. The cloud or hybrid deployments including all the permissions, typologies, firewalls, and networking configuration can take weeks for all the accessibility issues to be resolved. However, the delay could have been client-related and not necessarily the solution.

What about the implementation team?

We provide the implementation service.

What's my experience with pricing, setup cost, and licensing?

Apache Kafka is free. My clients were using Confluent which provides high-quality support and services, and it was relatively expensive for our client. There was a lot of back and forth on negotiating the price.

Confluent has an offering that has Cloud-Based pricing. There are different packages, prices, and capabilities. The highest level being the most expensive. AWS provides services to their market, for example, to have Kafka running. I do not know what the pricing is and I am fairly confident, Azure and GCP provide similar services.

What other advice do I have?

My advice to others wanting to implement this solution is to start with data streaming projects, not simple messaging projects because while it is very good at general-purpose messaging, it is more suited and geared for when you are using it as a streaming solution.

I rate Apache Kafka an eight out of ten.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Project Engineer at Wipro Limited
Real User
Top 20
Free to use, mature, and offers good scalability
Pros and Cons
  • "It's an open-source product, which means it doesn't cost us anything to use it."
  • "The UI is based on command line. It would be helpful if they could come up with a simpler user interface."

What is our primary use case?

We primarily use the solution for big data. We often get a million messages per second, and with such a high output we use Kafka to help us handle it. 

What is most valuable?

When we're working with big data, we need a throughput computing panel, which is something that Kafka provides, and something we find extremely valuable. It helps us support computing and ensures there's no loss of data. It can even do replication with some data.

The delivery of data is it's most valuable aspect.

It's an easy to use product overall.

The solution is quite mature.

It's an open-source product, which means it doesn't cost us anything to use it.

What needs improvement?

We're still going through the solution. Right now, I can't suggest any features that might be missing. I don't see where there can be an improvement in that regard.

The speed isn't as fast as RabbitMQ, even though the solution touts itself as very quick. It could be faster. They should work to make it at least as fast as RabbitMQ.

The UI is based on command line. It would be helpful if they could come up with a simpler user interface.

They should make it easier to configure items on the solution.

The solution would benefit from the addition of better monitoring tools.

For how long have I used the solution?

I've been using the solution for six months.

What do I think about the stability of the solution?

The solution is a bit slow in comparison to RabbitMQ. It's supposed to be a very fast solution, and it has okay performance, but speed-wise, it's quite slow.

What do I think about the scalability of the solution?

The scaling of the solution is quite good.

How are customer service and technical support?

In terms of technical support, we don't get that directly from Apache Kafka. We have certain cloud data distribution so we get assistance from our cloud data support.

How was the initial setup?

We're continuously deploying the product. We're still in the process of deployment.

What's my experience with pricing, setup cost, and licensing?

It's an open-source product, so the pricing isn't an issue. It's free to use. We don't have costs associated with it.

Which other solutions did I evaluate?

I'm not the product owner, so I didn't have a say in what should be chosen. We were seeing a high throughput with Kafka which is why we ultimately chose it.

What other advice do I have?

 I'd rate the solution eight out of ten. It's good at scaling, and, performance-wise, it's excellent. If they could add upon the UI and allow for easier configuration, I'd rate them higher.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Paul Adams - PeerSpot reviewer
Consultant Solution Architect at a tech services company with 51-200 employees
Consultant
Straightforward implementation, highly resilient, and good support
Pros and Cons
  • "The most valuable feature of Apache Kafka is its versatility. It can solve many use cases or can be a part of many use cases. Its fundamental value of it is in the real-time processing capability."
  • "Managing Apache Kafka can be a challenge, but there are solutions. I used the newest release, as it seems they have removed Zookeeper, which should make it easier. Confluent provides a fully managed Kafka platform, in which the cluster does not need to be managed."

What is our primary use case?

We had an application stack consisting of Salesforce frontend and a Commander VPN position management system and used Apache Kafka to decouple the microservices. Additionally, we planned to use Kafka for stream processing and to use event sourcing to pull data from legacy systems and reference data to form a compacted topic that the microservices could consume.

The usage of Kafka is a combination of deploying on a personal Kubernetes cluster or using a managed service such as MSK. However, most people who use Kafka are using a managed service provided by Confluent. It can be deployed on the cloud or on-premise.

What is most valuable?

The most valuable feature of Apache Kafka is its versatility. It can solve many use cases or can be a part of many use cases. Its fundamental value of it is in the real-time processing capability.

You need time-sensitive technology now, particularly in the analytics space. We have looked at using change data capture and Apache Kafka to modernize our analytics capabilities. Additionally, microservices can be used to capture events from legacy systems.

What needs improvement?

Managing Apache Kafka can be a challenge, but there are solutions. I used the newest release, as it seems they have removed Zookeeper, which should make it easier. Confluent provides a fully managed Kafka platform, in which the cluster does not need to be managed.

If it is a native Apache Kafka, it would have schema registry capabilities. However, this type of functionality is often provided by third-party tools. Additionally, there may be a need for improved manageability and additional tools to manage the cluster, including standard operational metrics and inbuilt management capabilities.

For how long have I used the solution?

I have been using Apache Kafka for approximately three years.

What do I think about the stability of the solution?

The solution is highly resilient.

I rate the stability of Apache Kafka a nine out of ten.

What do I think about the scalability of the solution?

Apache Kafka is scalable.

I rate the scalability of Apache Kafka a nine out of ten.

How are customer service and support?

The support from Apache Kafka is good.

How was the initial setup?

The initial setup of Apache Kafka is easy to set up a cluster.  I did the initial setup on my laptop and it is straightforward. I used the Confluent version, but even if you want to run native capabilities it's straightforward to do the implementation.

What about the implementation team?

The recent proof of concept was done on behalf of a client by a system integrator. Similarly, the previous one was mainly done in-house and it utilized Confluent, Apache Kafka, and MSK. The process involved setting up pre-built capabilities.

What's my experience with pricing, setup cost, and licensing?

The price of the solution is low.

I rate the price of Apache Kafka a nine out of ten.

What other advice do I have?

I rate Apache Kafka a nine out of ten.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: My company has a business relationship with this vendor other than being a customer: Integrator
PeerSpot user
Assistant Professor at CHAROTAR UNIVERSITY OF SCIENCE AND TECHNOLOGY
Real User
Difficult to configure, lacking automation, but has good community support
Pros and Cons
  • "The valuable features are the group community and support."
  • "The solution can improve by having automation for developers. We have done many manual calculations and it has been difficult but if it was automated it would be much better."

What is our primary use case?

We are in the early stages of testing this solution in our lab as a demo. It is in development and we are not in production at this point.

We are using this solution to relay events when they happen to multiple receivers at once to allow better functionality.

How has it helped my organization?

Apache Kafka has helped our client's online restaurant company by allowing them to take any orders and send the notifications with some other details, such as logic commands, to the different microservices.

What is most valuable?

The valuable features are the group community and support.

What needs improvement?

The solution can improve by having automation for developers. We have done many manual calculations and it has been difficult but if it was automated it would be much better.

For how long have I used the solution?

I have been using this solution for approximately three months.

What do I think about the scalability of the solution?

The solution's scalability is important for our ability to have more throughput from multiple receivers. If we need more throughput it can deliver.

Which solution did I use previously and why did I switch?

We did use other solutions previously but this solution makes things a lot easier.

How was the initial setup?

The installation is fairly easy. Additionally, there is a cloud-based version available if a use case requires it.

What about the implementation team?

We did the implementation ourselves.

What's my experience with pricing, setup cost, and licensing?

The solution is free, it is open-source.

What other advice do I have?

There is a lot of configuration involved in this solution. We have found many configurations that have helped us but it would be beneficial if there was automation. 

I rate Apache Kafka a five out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Senior Technical Architect at a computer software company with 51-200 employees
Real User
Its publisher-subscriber pattern has allowed our applications to access and consume data in real time.
Pros and Cons
  • "I like the performance and reliability of Kafka. I needed a data streaming buffer that could handle thousands of messages per second with at least one processing point for an analytics pipeline. Kafka fits this requirement very well."
  • "As an open-source project, Kafka is still fairly young and has not yet built out the stability and features that other open-source projects have acquired over the many years. If done correctly, Kafka can also take over the stream-processing space that technologies such as Apache Storm cover."

How has it helped my organization?

Through its publisher-subscriber pattern, Kafka has allowed our applications to access and consume data at a real time pace.

What is most valuable?

I like the performance and reliability of Kafka. I needed a data streaming buffer that could handle thousands of messages per second with at least one processing point for an analytics pipeline. Kafka fits this requirement very well, as it is a fast, distributed message broker. It definitely does exactly what it is designed to do.

What needs improvement?

As an open-source project, Kafka is still fairly young and has not yet built out the stability and features that other open-source projects have acquired over the many years. If done correctly, Kafka can also take over the stream-processing space that technologies such as Apache Storm cover.

Currently, as it is in the big/fast data integration world, you need to piece together many different open-source technologies. For example, to create a reliable, fault-tolerant streaming processing system that ingests data, you need:

  • a producer service
  • an event/message buffer such as Kafka or a message queue
  • a stream processing consumer such as Spark, Flink, Storm, etc.
  • something to help facilitate the ingestion into target datasources such as Flume or some customized concoction.

This is simply to ingest the data and does not necessarily account for the analytical pieces, which may consist of Spark ML, SystemML, ElasticSearch, Mahout, etc.

What I'm getting at is basically the need for a Spring framework of big data.

What do I think about the stability of the solution?

The only stability issues we had were mostly a result of the evolving APIs and existing bugs.

What do I think about the scalability of the solution?

Kafka is designed to be very easily scalable so I did not have any trouble here.

How are customer service and technical support?

We used the open-source version and did not buy support from Confluent.

Which solution did I use previously and why did I switch?

We did not have any other previous solutions. Our project was green field and a new type of project development.

How was the initial setup?

Initial setup was straightforward. We simply hosted multiple Kafka brokers and ZooKeeper servers on AWS EC2 instances.

What about the implementation team?

We implemented it in-house and then went with the Hortonworks Data Platform distribution.

Which other solutions did I evaluate?

We evaluated AWS Kinesis as well.

What other advice do I have?

Kafka is open source and requires an administrator to maintain the servers.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user