Try our new research platform with insights from 80,000+ expert users
Enterprise Architect at Smals vzw
Real User
Top 20
Effective event sequencing, seamless system interactions, and beneficial data management
Pros and Cons
  • "There are numerous possibilities that can be explored. While it may be challenging to fully comprehend the potential advantages, one key aspect is the ability to establish a proper sequence of events rather than simply dealing with a jumbled group of occurrences. These events possess their own timestamps, even if they were not initially provided with one, and are arranged in a chronological order that allows for a clear understanding of the progression of the events."
  • "There have been some challenges with monitoring Apache Kafka, as there are currently only a few production-grade solutions available, which are all under enterprise license and therefore not easily accessible. The speaker has not had access to any of these solutions and has instead relied on tools, such as Dynatrace, which do not provide sufficient insight into the Apache Kafka system. While there are other tools available, they do not offer the same level of real-time data as enterprise solutions."

What is our primary use case?

Apache Kafka is used for more than only a messaging bus but also served as a database to store information. It functioned as a streamer, similar to ETL, to manipulate and transform events before migrating them to other systems for use. The database could also act as a cache. Apache Kafka is used as a database broker, streamer, and source of truth for multiple systems due to its ability to maintain events for at least 10 days. It provided both synchronous and asynchronous communication, making it a complex system that would be easier to understand through diagrams or sketches.

We use reactive frameworks.

How has it helped my organization?

From my experience with Apache Kafka, one of the most notable advantages is its ability to maintain a comprehensive record of historical data that includes every update, alteration, and version of information, unlike a conventional relational database. This feature allows for seamless tracking and analysis of the progression and transformation of the data over time, enabling users to easily review and analyze the history of the information.

The solution has the capability for various systems to effortlessly interact with one another without prior knowledge of their existence, current operational status, or specific configurations. By utilizing service buses and dynamic integration, data can be distributed across networks and retrieved in a way that is most suitable for each system's requirements. In addition, Apache Kafka allows for the modification of data to provide diverse clients, consumers, or observers with unique and varying data. The replication of data can produce multiple versions, and this data can be adjusted to fit various needs. With the use of probes, one can alter the behavior of the transformation process, thereby changing the way in which data is transformed and the output produced. Overall, working with Apache Kafka has brought about an array of benefits, enabling seamless system interactions and allowing for the customization and modification of data to meet individual requirements.

What is most valuable?

There are numerous possibilities that can be explored. While it may be challenging to fully comprehend the potential advantages, one key aspect is the ability to establish a proper sequence of events rather than simply dealing with a jumbled group of occurrences. These events possess their own timestamps, even if they were not initially provided with one, and are arranged in a chronological order that allows for a clear understanding of the progression of the events.

What needs improvement?

There have been some challenges with monitoring Apache Kafka, as there are currently only a few production-grade solutions available, which are all under enterprise license and therefore not easily accessible. The speaker has not had access to any of these solutions and has instead relied on tools, such as Dynatrace, which do not provide sufficient insight into the Apache Kafka system. While there are other tools available, they do not offer the same level of real-time data as enterprise solutions.

One additional area that I think could benefit from improvement is the deployment process on OpenShift. This particular deployment is quite challenging and requires the activation of certain security measures as well as integration with other systems. It's not a straightforward process and typically requires engineers who are highly skilled and have extensive experience with Apache Kafka to carry out these tasks. Therefore, I believe that there is a need for progress in this area, and some tools that can provide information, assistance, and help make the whole process easier would be greatly appreciated.

Buyer's Guide
Apache Kafka
January 2025
Learn what your peers think about Apache Kafka. Get advice and tips from experienced pros sharing their opinions. Updated: January 2025.
829,634 professionals have used our research since 2012.

For how long have I used the solution?

I have been using Apache Kafka for approximately four years.

What do I think about the stability of the solution?

The solution is stable if you have set it up correctly.

What do I think about the scalability of the solution?

Apache Kafka is a scalable solution.

How are customer service and support?

I have not escalated any questions to technical support because Apache Kafka is an open-source system. However, Confluent and other companies sell support and enterprise solutions to make it more convenient and streamline the work. They offer tools, such as a monitoring tool with a visual interface, which provides a lot of information and buttons to press for correction or change without touching the code. Each of those buttons hypothetically could have helped the situation, but it is unclear what they do exactly, it is best to call the data center and ask. If you buy their service, you have access to all the enterprise comforts.

How was the initial setup?

Setting up Apache Kafka is, is not an easy task, especially when trying to containerize it and make it controllable. This is because Apache Kafka has its own distributed mechanism for staying alive, checking readiness, replicating, and scaling. Ensuring that it complies with Kubernetes or OpenShift Orchestrator requires careful attention, as there is a risk of two masters attempting to perform the same task and ultimately undoing each other's work.

In comparison to Kubernetes, OpenShift is a highly skilled and advanced implementation infrastructure that automatically manages and orchestrates all the steps required for an application setup. It operates at a higher level of abstraction and eliminates the need for manual operations that are required with Kubernetes. While Kubernetes can run an application with some pipeline and configuration, OpenShift takes care of everything from finding the required images to creating ports and connecting databases. Although manual changes can be made, it's not necessary as OpenShift offers a much more course-grained management approach.

What about the implementation team?

One skillful DevOps engineer can implement the solution.

What's my experience with pricing, setup cost, and licensing?

Apache Kafka is an open-source solution.

What other advice do I have?

The maintenance of Apache Kafka is crucial due to the complexity of the system with numerous microservices and systems communicating through Apache Kafka, requiring proper integration and configuration to prevent overloading and ensure a healthy cluster. The task is not easy and requires knowledge of the various adjustable parameters, as misadjusting even one of them can greatly slow down the cluster. For example, if the consumer group changes frequently, the messages must be regrouped and reassigned, causing significant delays. Therefore, configuring Apache Kafka correctly is essential to avoid high latency issues.

I would strongly suggest others give Apache Kafka a chance and explore the various advantages that it can offer, especially since it should not be perceived as a message bus or broker but rather an enterprise bus designed for data manipulation. It has the ability to transform data, store and reject it, and even maintain different versions of the same data simultaneously. Moreover, it operates on a pull mechanism rather than a push mechanism, which takes away the risk of losing data and places the responsibility for data loss on the consumer. On the other hand, it also ensures that the data is always available within the specified window and allows for easy replication of the past, which is extremely helpful in situations such as those involving a hacked bank database. With Apache Kafka, you can efficiently go back in time, obtain the required status and events, and make changes accordingly, without the need to go through each transaction separately. Thus, using this solution can make data management much more efficient and convenient.

I rate Apache Kafka an eight out of ten.

In order to improve its user-friendliness, engineer-friendliness, and DevOps-friendliness, the system must undertake various tasks, such as enhancing the overall operation and configuration, ensuring seamless integration with other systems, and adapting to security layers in a more comprehensive and generic manner. This will require significant efforts to make the system more functional, secure, and efficient.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Joaquin Marques - PeerSpot reviewer
CEO - Founder / Principal Data Scientist / Principal AI Architect at Kanayma LLC
Real User
Excellent for heavy-duty data classification; should do away with configuration problems
Pros and Cons
  • "Kafka allows you to handle huge amounts of data and classify it into different categories. If you have huge amounts of data, Kafka is a very good solution for data classification."
  • "Kafka is a nightmare to administer."

What is our primary use case?

My primary use case for Apache Kafka is replacing ETL and doing data transformations.

How has it helped my organization?

Kafka allows you to handle huge amounts of data and classify it into different categories. If you have huge amounts of data, Kafka is a very good solution for data classification. When you need to route it in different directions, you have to take a look at the messages that you get, interfile them, and then send them to the correct place. Kafka is a good product to use in the backend.

What is most valuable?

The feature I find most valuable is the classification feature. Kafka enables you to tag content with a category.

What needs improvement?

Kafka contains two components. The component that does the synchronization between the rest of the components, that's an older version of the software and it causes all kinds of configuration problems. The Confluent, which is the company that sells a commercial version of Kafka is getting away from that component precisely because of that. Kafka is a nightmare to administer.

In the next release, I would like to see that one troublesome component that causes configuration issues removed.

For how long have I used the solution?

I have been using Apache Kafka for a couple of years.

What do I think about the stability of the solution?

The stability of this solution depends on whether it is properly configured. Having said that, Kafka is incredibly complex to configure, set up, administer, and maintain.

What do I think about the scalability of the solution?

My opinion is that Apache Kafka is a scalable solution. In our organization, there are hundreds of thousands of users using Kafka.

How was the initial setup?

The initial setup was extremely complex. In our case, it took a team of 12 two months to deploy.

What about the implementation team?

These systems were installed by somebody else, not me.

What's my experience with pricing, setup cost, and licensing?

I would advise others to schedule a month or two to just set it up and have it up and running.

Which other solutions did I evaluate?

There are other options. For example, Databricks is a Kafka alternative. We decided to go with Kafka because one of our clients already chose Kafka.

While evaluating, we found out Databricks is more expensive, for the level of activity that Kafka handles (in this case, millions of requests per day). Databricks could do it, but it would be overly expensive.

I would rate Apache Kafka's pricing a seven out of ten, with one being cheap and 10 being very expensive.

What other advice do I have?

Since it has become so popular, large enterprises especially want to do it. For smaller enterprises, Kafka would probably be too expensive because they would have to hire people to maintain it.

I would rate the Apache Kafka solution a seven out of ten.

Which deployment model are you using for this solution?

Private Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Apache Kafka
January 2025
Learn what your peers think about Apache Kafka. Get advice and tips from experienced pros sharing their opinions. Updated: January 2025.
829,634 professionals have used our research since 2012.
reviewer1218324 - PeerSpot reviewer
Head of Technology - Money Movement Platform at a financial services firm with 10,001+ employees
Real User
Feature rich, highly scalable, and straightforward to implement
Pros and Cons
  • "All the features of Apache Kafka are valuable, I cannot single out one feature."
  • "Prioritization of messages in Apache Kafka could improve."

What is our primary use case?

We use Apache Kafka primarily to queue the transactions or total the transactions.

How has it helped my organization?

Apache Kafka has helped our organization handle larger volumes without affecting the infrastructure load.

What is most valuable?

All the features of Apache Kafka are valuable, I cannot single out one feature.

What needs improvement?

Prioritization of messages in Apache Kafka could improve.

For how long have I used the solution?

I have been using Apache Kafka for approximately six years.

What do I think about the stability of the solution?

The stability of Apache Kafka is very good.

What do I think about the scalability of the solution?

Apache Kafka is the most scalable solution in the market.

How are customer service and support?

I have not used the support from Apache Kafka.

How was the initial setup?

Apache Kafka is straightforward to implement.

What about the implementation team?

We did the implementation of Apache Kafka in-house.

Which other solutions did I evaluate?

I did not evaluate other solutions.

What other advice do I have?

I rate Apache Kafka a nine out of ten.

Which deployment model are you using for this solution?

Hybrid Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Technical Director at Metrofibre Networx
Real User
Top 20
A reliable and stable stream-processing platform with a good customer support team
Pros and Cons
  • "As a software developer, I have found Apache Kafka's support to be the most valuable...The solution is easy to integrate with any of our systems."
  • "The solution should be easier to manage. It needs to improve its visualization feature in the next release."

What is our primary use case?

We have a camera monitoring security system, in which we post messages onto the queue, which involves various steps in processing the message, like checking for the number of clients, running it against the police data, etc. So Apache Kafka is a security application with many types of consumers. We set up a workflow system with different sites, which works well.

What is most valuable?

As a software developer, I have found Apache Kafka's support to be the most valuable. The support team sends available information regarding the library and how to use the plugins. The solution is easy to integrate with any of our systems. We have other alternatives, but this is the one that seems to be the most popular database support.

What needs improvement?

The solution should be easier to manage. It needs to improve its visualization feature in the next release.

For how long have I used the solution?

I have been using it for three years.

What do I think about the stability of the solution?

It is a stable solution. We never faced any issues. I rate it a ten out of ten.

What do I think about the scalability of the solution?

It is a scalable solution. We set up a category with different consumers balancing things, which works as I thought.

How are customer service and support?

I did not contact the technical support as it was not required.

Which solution did I use previously and why did I switch?

We used Linksys for visualization along with Confluence, but there needed to be more value. For us, Apache Kafka is the best solution based on the support and third-party systems as it builds our subsystems around because we have a lot of development teams.

How was the initial setup?

The initial setup was straightforward because I've got a lot of experience in this field. But even for a junior person, it would be fine. There are so many resources, and it's very well documented as they are a premium service provider. So it makes the setup just easier.

The deployment takes a few days.

We set up a free cluster for this service because we use a lot of data. We use ZooKeeper to secure different products for instruction with the cluster. But, it was easy as it is a popular product, and much information is available. It can download data, like fifty gigs per day. We can effectively handle it all as well. I never developed any issues.

What's my experience with pricing, setup cost, and licensing?

It's a premium product, so it is not price-effective for us.

What other advice do I have?

Apache Kafka is an out-of-the-box, reliable solution. For people in the fiber business, we need a reliable solution, and this solution is hundred percent reliable. If it is set up correctly, it hardly has any issues due to the more extensive user base; even if there are issues, it is sorted by the community. I rate it nine out of ten.

Which deployment model are you using for this solution?

Private Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Felipe Lopes - PeerSpot reviewer
Engineering Manager at Alice
Real User
You can receive and distribute data in real-time
Pros and Cons
  • "I have seen a return on investment with this solution."
  • "I suggest using cloud services because the solution is expensive if you are using it on-premises."

What is our primary use case?

The primary use case of the solution is for asset communication through our microservices.

How has it helped my organization?

The solution has allowed us to take the use cases provided by another communication tool and resolve those issues.

What is most valuable?

The most valuable feature is how persistent it is. For example, we are able to reprocess messages when we need to, we're able to recover methods to consume them.

What needs improvement?

The solution can be improved by reducing the cost to run it on the premises.

For how long have I used the solution?

I have been using the solution for four years.

What do I think about the stability of the solution?

The stability of the solution is good.

What do I think about the scalability of the solution?

The solution is scalable.

How was the initial setup?

The initial setup was straightforward.

What about the implementation team?

The implementation was through a vendor.

What was our ROI?

I have seen a return on investment with this solution.

What other advice do I have?

I give the solution a nine out of ten.

We have 80 people using the solution and five people are required to maintain it.

I suggest using cloud services because the solution is expensive if you are using it on-premises.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer1421481 - PeerSpot reviewer
Solution Architect at a manufacturing company with 10,001+ employees
Real User
Good performance when a high throughput is required, but they need to implement a portal
Pros and Cons
  • "The processing power of Apache Kafka is good when you have requirements for high throughput and a large number of consumers."
  • "They need to have a proper portal to do everything because, at this moment, Kafka is lagging in this regard."

What is our primary use case?

I am a solution architect and I used Apache Kafka in this role.

What is most valuable?

The processing power of Apache Kafka is good when you have requirements for high throughput and a large number of consumers. 

What needs improvement?

They need to have a proper portal to do everything because, at this moment, Kafka is lagging in this regard. It could be used to do the preprocessing or the configurations, instead of directly doing it on the queues or the topics. If you look at Solace, for example, they have come up with a portal where you don't need to touch these activities. You don't need to access the platform beyond the portal.

For how long have I used the solution?

I have used Apache Kafka for between one and one and a half years.

What do I think about the stability of the solution?

Apache Kafka is stable.

What do I think about the scalability of the solution?

This is certainly a scalable product. There are currently 30 or more people using it but we expect to scale beyond this. It is going to be an enterprise tool within the company.

How are customer service and technical support?

I am not directly interacting with the service people at this moment. It is limited for now because we are still exploring and effecting our architecture and design, and deciding how to align it with our existing strategy. There is not much progress in this regard and it will take more time.

Which solution did I use previously and why did I switch?

Prior to working with Apache Kafka, there was no messaging queue system. For many projects, they were using the Azure Event Hub, but it was not serving the purpose. So, we started moving towards Kafka, and that's why we have procured Confluent Kafka.

Several months ago, I stopped working on Apache Kafka. I am now working on Confluent Kafka. It was not my decision to switch solutions.

My current organization has chosen Confluent Kafka for various reasons. One is that we have a large number of streaming requirements, and Confluent Kafka has one more layer on top of Apache Kafka to do this transformation and connecting with other multiple lane systems.

There are out-of-the-box features along with the KSQL features. For example, things like fetching the events are kind of query-based. So, that seems to be a good feature for our requirements. That is why we ultimately procured Confluent Kafka.

For some time, I have also worked with Solace and it has an advantage. Given that my core strength is integration, I work with integration platforms such as MuleSoft, Azure functions, then TIBCO. Based on our requirements, I found that the event-driven APA implementation with Solace was easier.

Solace also has a top-notch solution for portal management and you register your producers, consumers, and preprocessing logic. All of these things are pretty easy to do. This is an area where Kafka could use some enhancement.

How was the initial setup?

I don't think that the initial setup was a complex process.

Which other solutions did I evaluate?

MQ messaging systems are not my core strength but for any integration platform where we have a large number of APIs and events, to integrate with an IoT platform, for example, I found Kafka is better than ActiveMQ.

I'm not getting into in MQTT or other things but comparatively, when you compare ActiveMQ and Kafka, Kafka has done better.

What other advice do I have?

I think that many people are using Apache Kafka just as a publishing and subscription model, but I feel that Kafka is better than that. Furthermore, Confluent Kafka is even more than that.

Confluent Kafka is offering features that are equal to those of a data lake. You can do lots with data, and huge data can be persisted. However, many people are not using that feature. Rather than make use of persistence logic, they are pushing the messages and consuming them. Maybe if people were using it for persistence, they would see the impact or real power of Kafka.

I would rate this solution a seven out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Roger Sabourin - PeerSpot reviewer
Roger SabourinSenior Manager, Analyst Relations at a tech vendor with 201-500 employees
Real User

You're in luck, Solace's PubSub+ Event Portal for Kafka does all the things you're looking for, specifically for your Kafka environments, be they open source Kafka, Confluent or Amazon MSK.  Check it out, or request a free trial at https://solace.com/products/po...

reviewer1247268 - PeerSpot reviewer
Technology Lead at a tech services company with 10,001+ employees
MSP
Top 20
A cost-effective solution for high volume, multi-source data collection
Pros and Cons
  • "The most valuable feature is that it can handle high volume."
  • "Kafka does not provide control over the message queue, so we do not know whether we are experiencing lost or duplicate messages."

What is our primary use case?

Our company provides services and we use Apache Kafka as part of the solution that we provide to clients.

One of the use cases is to collect all of the data from multiple endpoints and provide it to the users. Our application integrates with Kafka as a consumer using the API, and then sends information to the users who connect. 

What is most valuable?

The most valuable feature is that it can handle high volume.

Apache Kafa is open-source and some of our clients are interested in becoming more involved in that.

What needs improvement?

Kafka does not provide control over the message queue, so we do not know whether we are experiencing lost or duplicate messages. Better control over the message queue would be an improvement. Solutions such as ActiveMQ do afford better control. Because of this, there is sometimes a gap in the results where we have either lost messages, or there are duplicates.

We have had problems when there was an imbalance because all of the messages were being sent back.

For how long have I used the solution?

I'm a beginner with Apache Kafka.

What do I think about the stability of the solution?

I cannot judge stability without having better control over the message queue, although I feel that it is not 100% stable. 

How are customer service and technical support?

We have not been in contact with technical support. For our first implementation with it, Kafka was already set up and running. When we did our PoC, I was not part of the team who was facing issues and it was they who were in contact with support.

Which solution did I use previously and why did I switch?

I also have experience with IBM MQ.

How was the initial setup?

We had problems when we were setting up Kafka ourselves to conduct our PoC internally. Kafka would not start and it was related to parameters or property settings in Java. We were able to work around it, but we had problems like adding certificates.

What about the implementation team?

In one case, we were using Kafka after it had already been set up, externally. It worked fine and we just had to configure some of the connectors that we wanted to try out.

What's my experience with pricing, setup cost, and licensing?

Apache Kafka is open-source and can be used free of charge.

What other advice do I have?

In this type of solution, you need to be able to accept a high volume of messages, but not lose any, and not have any duplicates. Because we are unable to control the queue in Kafka, I cannot say that this works 100%.

The suitability of this solution depends on the use cases. There are two or three things that we are worried about, and we will be very careful in choosing solutions. In cases where the messages are well organized, or there is no worry that there will be duplicate or dropped messages, then I recommend using Kafka. Also, I recommend this solution for those looking to get involved with open-source applications.

Other than the problems with having no control over the queue, Apache Kafka is wonderful.

I would rate this solution an eight out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Paul Adams - PeerSpot reviewer
Consultant Solution Architect at a tech services company with 51-200 employees
Consultant
Top 20
Straightforward implementation, highly resilient, and good support
Pros and Cons
  • "The most valuable feature of Apache Kafka is its versatility. It can solve many use cases or can be a part of many use cases. Its fundamental value of it is in the real-time processing capability."
  • "Managing Apache Kafka can be a challenge, but there are solutions. I used the newest release, as it seems they have removed Zookeeper, which should make it easier. Confluent provides a fully managed Kafka platform, in which the cluster does not need to be managed."

What is our primary use case?

We had an application stack consisting of Salesforce frontend and a Commander VPN position management system and used Apache Kafka to decouple the microservices. Additionally, we planned to use Kafka for stream processing and to use event sourcing to pull data from legacy systems and reference data to form a compacted topic that the microservices could consume.

The usage of Kafka is a combination of deploying on a personal Kubernetes cluster or using a managed service such as MSK. However, most people who use Kafka are using a managed service provided by Confluent. It can be deployed on the cloud or on-premise.

What is most valuable?

The most valuable feature of Apache Kafka is its versatility. It can solve many use cases or can be a part of many use cases. Its fundamental value of it is in the real-time processing capability.

You need time-sensitive technology now, particularly in the analytics space. We have looked at using change data capture and Apache Kafka to modernize our analytics capabilities. Additionally, microservices can be used to capture events from legacy systems.

What needs improvement?

Managing Apache Kafka can be a challenge, but there are solutions. I used the newest release, as it seems they have removed Zookeeper, which should make it easier. Confluent provides a fully managed Kafka platform, in which the cluster does not need to be managed.

If it is a native Apache Kafka, it would have schema registry capabilities. However, this type of functionality is often provided by third-party tools. Additionally, there may be a need for improved manageability and additional tools to manage the cluster, including standard operational metrics and inbuilt management capabilities.

For how long have I used the solution?

I have been using Apache Kafka for approximately three years.

What do I think about the stability of the solution?

The solution is highly resilient.

I rate the stability of Apache Kafka a nine out of ten.

What do I think about the scalability of the solution?

Apache Kafka is scalable.

I rate the scalability of Apache Kafka a nine out of ten.

How are customer service and support?

The support from Apache Kafka is good.

How was the initial setup?

The initial setup of Apache Kafka is easy to set up a cluster.  I did the initial setup on my laptop and it is straightforward. I used the Confluent version, but even if you want to run native capabilities it's straightforward to do the implementation.

What about the implementation team?

The recent proof of concept was done on behalf of a client by a system integrator. Similarly, the previous one was mainly done in-house and it utilized Confluent, Apache Kafka, and MSK. The process involved setting up pre-built capabilities.

What's my experience with pricing, setup cost, and licensing?

The price of the solution is low.

I rate the price of Apache Kafka a nine out of ten.

What other advice do I have?

I rate Apache Kafka a nine out of ten.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: My company has a business relationship with this vendor other than being a customer: Integrator
PeerSpot user