In the data sharing space, the performance of Apache Kafka could be improved. The performance angle is critical, and while it works in milliseconds, the goal is to move towards microseconds.
Kafka requires fine-tuning to find the best architecture, number of nodes, and partitions for your use case. It’s a trial-and-error process with no one-size-fits-all solution. Issues may arise until it’s appropriately tuned. While it can scale out efficiently, scaling down is more challenging, making deleting data or reducing activity harder.
Big Data Teaching Assistant at Center for Cloud Computing and Big Data, PES University
Real User
Top 5
2024-10-25T14:12:00Z
Oct 25, 2024
Confluent has improved aspects like documentation and cloud support, yet Kafka's reliance on older architectures like ZooKeeper in previous versions is a limitation. Its language and architecture could be further improved to solve issues in consensus algorithms, as Red Panda does.
Kafka has some limitations in terms of queue management. Specifically, it lacks the capability to handle larger queues for external system interactions. It would be beneficial if Kafka included more robust, high-capacity queue management features for integration with external systems.
Lead Data Scientist at a transportation company with 51-200 employees
Real User
Top 5
2024-05-02T10:25:11Z
May 2, 2024
One complexity that I faced with the tool stems from the fact that since it is not kind of a stand-alone application, it won't integrate with native cloud, like AWS or Azure. Apache Kafka has another mask on it, so if users can have a direct service, like Grafana, that can actually be used as a stand-alone tool with Grafana cloud, or you can use a mix of AWS and Grafana, so there is not much difference with it. I expect Apache Kafka to have Grafana's same nature. The product's support and the cloud integration capabilities are areas of concern where improvements are required.
The main challenge we faced while integrating Apache Kafka with other tools was setting up SSL and securing connections. Managing certificate changes and ensuring all clients connect smoothly, especially outside Kubernetes environments, posed ongoing challenges. Once initially set up, maintaining and sharing these security configurations became more manageable, but ensuring compatibility across different environments remained a continuous effort.
Vice President (Information and Product Management) at Tradebulls Securities (P) Limited
Real User
Top 10
2023-09-13T09:37:10Z
Sep 13, 2023
In Apache Kafka, it is currently difficult to create a consumer. The implementation of Apache Kafka's features, like rebalancing, is possible only when you create a consumer, which is a very difficult task since it is overly complicated. To create a consumer in Apache Kafka, a person needs to have a very strong knowledge of the internal functioning of Apache. I feel that Kaka needs to provide a consumer so that its users don't spend time in the creation of consumers. In general, Apache Kafka must provide users with a more user-friendly UI.
Maintaining and configuring Apache Kafka can be challenging, especially when you want to fine-tune its behavior. It involves configuring traffic partitioning, understanding retention times, and dealing with various variables. Monitoring and optimizing its behavior can also be difficult. Perhaps a more straightforward approach could be using messaging queues instead of the publish-subscribe pattern. Some solutions may not require the complex features of Apache Kafka, and a messaging queue with Kafka's capabilities might provide a more complete messaging solution for events and messages.
Kafka is a new method we opted to apply to our need for data exchange. Also, we use the solution's integration capabilities. Irovement-wise, I would like the solution to have more integration capabilities. Also, the solution's setup, which is currently complex, should be made easier.
Group Manager at a media company with 201-500 employees
Real User
Top 20
2023-04-25T09:46:00Z
Apr 25, 2023
One of the major areas for improvement, which I have to check out, is their pulling mechanism. Sometimes, when the data volume is too huge, and I have a pulling period of, let's say, one minute, there can be issues due to technical glitches, data anomalies, or platform-related issues such as cluster restarts. These polling periods tend to stop messaging use, and the restart ability part needs to be improved, especially when data volumes are too high. If there are obstructions due to technical glitches or platform issues, sometimes we have to manually clean up or clear the queue before it eventually gets sealed. It doesn't mean it doesn't get restarted on its own, but it takes too much time to catch up. At that point, one year ago, I couldn't find a solution to make it more agile in terms of catching up quickly and showing that it is real-time in case of any downtime. This was one area where I couldn't find a solution when I connected with Cloudera and Apache. One of our messaging tools was sending a couple of million records. We found it tough when there were any cluster downtimes or issues with the subscribers consuming data. For future releases, one feature I would like to see is a more robust solution in terms of restart ability. It should be able to handle platform issues and data issues and restart seamlessly. It should not cause a cascading effect if there is any downtime. Another feature that would be helpful is if they could add monitoring features as they have for their other services. A UI where I can monitor the capacity of the managed queue and resources I need to utilize more to make it ready for future data volumes. It would be great to have analytics on the overall performance of Kafka to plan for data volumes and messaging use. Currently, we plan the cluster resources manually based on data volumes for Kafka. If they can have a UI for resource planning based on data volume, that could be a great addition.
There have been some challenges with monitoring Apache Kafka, as there are currently only a few production-grade solutions available, which are all under enterprise license and therefore not easily accessible. The speaker has not had access to any of these solutions and has instead relied on tools, such as Dynatrace, which do not provide sufficient insight into the Apache Kafka system. While there are other tools available, they do not offer the same level of real-time data as enterprise solutions. One additional area that I think could benefit from improvement is the deployment process on OpenShift. This particular deployment is quite challenging and requires the activation of certain security measures as well as integration with other systems. It's not a straightforward process and typically requires engineers who are highly skilled and have extensive experience with Apache Kafka to carry out these tasks. Therefore, I believe that there is a need for progress in this area, and some tools that can provide information, assistance, and help make the whole process easier would be greatly appreciated.
Software Engineer at a tech services company with 201-500 employees
Real User
2023-01-12T14:36:00Z
Jan 12, 2023
The interface has room for improvement, and there is a steep learning curve for Hadoop integration. It was a struggle learning to send from Hadoop to Kafka. In future releases, I'd like to see improvements in ETL functionality and Hadoop integration.
CEO - Founder / Principal Data Scientist / Principal AI Architect at Kanayma LLC
Real User
2022-12-06T15:50:00Z
Dec 6, 2022
Kafka contains two components. The component that does the synchronization between the rest of the components, that's an older version of the software and it causes all kinds of configuration problems. The Confluent, which is the company that sells a commercial version of Kafka is getting away from that component precisely because of that. Kafka is a nightmare to administer. In the next release, I would like to see that one troublesome component that causes configuration issues removed.
I would like them to reduce the learning curve around the creation of brokers and topics. They also need to improve on the concept of the partitions. As for features, RabbitMQ has an instant response feature where you can send a queue and get an instant response, but Kafka only has one way to send queues. If that's something they could improve on, it would be great.
CEO /Consultant at Version Two Software Solutions Ltd
Real User
2022-10-06T14:58:58Z
Oct 6, 2022
We struggled a bit with the built-in data transformations because it was a challenge to get them up and running the way we wanted. There was a bit of a learning curve. It may be that we didn't fully grasp the information. Also, the documentation covering certain aspects was a bit poor. We had to trawl around different locations to try to find what we needed. When we were able to find documentation on transformation, for example, there wasn't a good set of documentation examples we could use, and the examples we had weren't quite meeting the need. Better examples would've helped us.
Apache Kafka can improve by providing a UI for monitoring. There are third-party tools that can do it, but it would be nice if it was already embedded within Apache Kafka.
When compared to other commercial competitors, Kafka doesn't have the ability to scale down, the elasticity is lacking in the product. The other issue for us is the delayed queue, which was available to us in the commercial software but not in Kafka. It's something we use in most of our applications for deferred processing and I know it's available in other solutions. I'd like to see some tooling support and language support in the open source version.
To store a large set of analytical data we are using SQL repository. This type of repository works very well but we need specific and high maintenance. The user experience is friendly. We are looking for alternative solutions, we tried with noSQL solutions and Confluent specific features but the results were not satisfactory both in terms of performance and usability. We are working on automated SQL repository management and maintenance tools in order to increase the democratization of our platform.
The management overhead is more compared to the messaging system. There are challenges here and there. Like for long usage, it requires restarts and nodes from time to time.
Kafka has a lot of monitors, but sometimes it's most important to just have a simple monitor. Improvements to Kafka's management would be nice, but it's not so necessary for me. There are a lot of consoles that offer a better view than Kafka. Some are free, and some are paid, but I'm thinking about streaming. For example, if you connect more streams to a component in the same queue, how will it integrate to recognize the flow and the message?
More Windows support, I believe, is one area where it can improve. We need to wrap it as a service, but there isn't one built into Windows. So that's something they could improve. I believe Windows Server is primarily aimed at the Windows shop or those who use Windows.
We are still on the production aspect, with our service provider or hyper-scalers providing the solutions. I would like to see some improvement on the HA and DR solutions, where everything is happening in real-time. Kafka's interface could also use some work. Some of our products are in C, and we don't have any libraries to use with C. From an interface perspective, we had a library from the readies. And we are streaming some of the products we built to readies. That is one of the requirements. It would be good to have those libraries available in a future release for our C++ clients or public libraries, so we can include them in our product and build on that.
Kafka can allow for duplicates, which isn't as helpful in some of our scenarios. They need to work on their duplicate management capabilities but for now developers should ensure idempotent operations for such scenarios. While the solution scales well and easily, you need to understand your future needs and prep for the peaks.
Sr Technical Consultant at a tech services company with 1,001-5,000 employees
Real User
2021-06-26T01:12:49Z
Jun 26, 2021
I would like to see real-time event-based consumption of messages rather than the traditional way through a loop. The traditional messaging system works by listing and looping with a small wait to check to see what the messages are. A push system is where you have something that is ready to receive a message and when the message comes in and hits the partition, it goes straight to the consumer versus the consumer having to pull. I believe this consumer approach is something they are working on and may come in an upcoming release. However, that is message consumption versus message listening. Confluent created the KSQL language, but they gave it to the open-source community. I would like to see KSQL be able to be used on raw data versus structured and semi-structured data.
Chief Technology Officer at a tech services company with 1-10 employees
Real User
2021-05-12T12:29:06Z
May 12, 2021
The graphical user environment is currently lacking in Apache. It's not available within the solution and needs to be built from scratch. Some of the open source products of this solution have limitations.
Senior Technology Architect at a tech services company with 10,001+ employees
Real User
2020-10-22T16:39:00Z
Oct 22, 2020
Some vendors don't offer extra features for monitoring. Some come with Linux for default monitoring. Monitoring is very important. If something is not working properly, then our subscribers won't receive a notification. You then have to trace it back to Kafka and find the glitch or the messaging sequence that hasn't been racked up correctly. It should support Avro — which handles different data formats — as a default data format. It would be much more flexible if it did.
Solution Architect at a manufacturing company with 10,001+ employees
Real User
2020-09-27T04:09:51Z
Sep 27, 2020
They need to have a proper portal to do everything because, at this moment, Kafka is lagging in this regard. It could be used to do the preprocessing or the configurations, instead of directly doing it on the queues or the topics. If you look at Solace, for example, they have come up with a portal where you don't need to touch these activities. You don't need to access the platform beyond the portal.
The model where you create the integration or the integration scenario needs improvement. It contains fewer developer words or maintaining words where someone prepared the topics, the connectors, or the streaming platforms. You would first need to have a control center from a third party for managing. If you would like to prepare something that is a more sophisticated integration scenario, where you use one microservice to provide the event or a second to several that consumed these microservices, then this needs to be modeled elsewhere. Also, when comparing to the traditional ESD for data mixing, you can create a scenario that could be deployed with inputs and some outputs. Most business like the topics, but for me, I think that it is a problem that messaging platforms have, there is no design tool with IDE for creating. It would be helpful to create a more complex solution for several types of styles, and not just for one provider or for one customer. That would be easier, but if you have more than one consumer then it could be a more complex scenario. It would be like events that go to several microservers to create orders, validate orders, and creating words. This would be helpful. In the next release, adding some IDE or developing tools, for creating better integration scenarios, even though it already a developer-oriented solution, would be helpful. It would also be helpful for the auto-deployment. Having a governance style would also be helpful to understand. It would be beneficial to have a repository of all of the topics, data types that exist, or data structures.
Due to the fact that the solution is open source, it has a zookeeper dependency. If I could change anything about the solution, it would be that. The solution could always add a few more features to enhance its usage.
This solution could be made easier to manage. Compatibility with other solutions and integration with other tools can be improved. We cannot apply all of our security requirements because it is hard to upload them.
The manageability should be improved. There are lots of things we need to manage and it should have a function that enables us to manage them all cohesively. There should be a default property. It's really hard to manage all these things.
We're still going through the solution. Right now, I can't suggest any features that might be missing. I don't see where there can be an improvement in that regard. The speed isn't as fast as RabbitMQ, even though the solution touts itself as very quick. It could be faster. They should work to make it at least as fast as RabbitMQ. The UI is based on command line. It would be helpful if they could come up with a simpler user interface. They should make it easier to configure items on the solution. The solution would benefit from the addition of better monitoring tools.
If the graphical user interface was easier for the Kafka administration it would be much better. Right now, you need to use the program with a command-line interface. If the graphical user interface was easier, it could be a better product.
Kafka does not provide control over the message queue, so we do not know whether we are experiencing lost or duplicate messages. Better control over the message queue would be an improvement. Solutions such as ActiveMQ do afford better control. Because of this, there is sometimes a gap in the results where we have either lost messages, or there are duplicates. We have had problems when there was an imbalance because all of the messages were being sent back.
There is a feature that we're currently using called MirrorMaker. We use it to combine the information from different Kafka servers into another server. It's very wide and it gives a very generic scenario. I think it would be great if the possibility would exist out of the box and not as a third party. The third party is not very stable and sometimes you have problems with this component. There are some developments in newer versions and we're about to try them out, but I'm not sure if it closes the gap.
Kafka 2.0 has been released for over a month, and I wanted to try out the new features. However, the configuration is a little bit complicated: Kafka Broker, Kafka Manager, ZooKeeper Servers, etc.
Apache Kafka is an open-source distributed streaming platform that serves as a central hub for handling real-time data streams. It allows efficient publishing, subscribing, and processing of data from various sources like applications, servers, and sensors.
Kafka's core benefits include high scalability for big data pipelines, fault tolerance ensuring continuous operation despite node failures, low latency for real-time applications, and decoupling of data producers from consumers.
Key...
In the data sharing space, the performance of Apache Kafka could be improved. The performance angle is critical, and while it works in milliseconds, the goal is to move towards microseconds.
Config management can be better. We are always trying to find the best configs, which is a challenge.
Kafka requires fine-tuning to find the best architecture, number of nodes, and partitions for your use case. It’s a trial-and-error process with no one-size-fits-all solution. Issues may arise until it’s appropriately tuned. While it can scale out efficiently, scaling down is more challenging, making deleting data or reducing activity harder.
Confluent has improved aspects like documentation and cloud support, yet Kafka's reliance on older architectures like ZooKeeper in previous versions is a limitation. Its language and architecture could be further improved to solve issues in consensus algorithms, as Red Panda does.
Kafka has some limitations in terms of queue management. Specifically, it lacks the capability to handle larger queues for external system interactions. It would be beneficial if Kafka included more robust, high-capacity queue management features for integration with external systems.
One complexity that I faced with the tool stems from the fact that since it is not kind of a stand-alone application, it won't integrate with native cloud, like AWS or Azure. Apache Kafka has another mask on it, so if users can have a direct service, like Grafana, that can actually be used as a stand-alone tool with Grafana cloud, or you can use a mix of AWS and Grafana, so there is not much difference with it. I expect Apache Kafka to have Grafana's same nature. The product's support and the cloud integration capabilities are areas of concern where improvements are required.
The main challenge we faced while integrating Apache Kafka with other tools was setting up SSL and securing connections. Managing certificate changes and ensuring all clients connect smoothly, especially outside Kubernetes environments, posed ongoing challenges. Once initially set up, maintaining and sharing these security configurations became more manageable, but ensuring compatibility across different environments remained a continuous effort.
Apache Kafka has performance issues that cause it to lag.
In Apache Kafka, it is currently difficult to create a consumer. The implementation of Apache Kafka's features, like rebalancing, is possible only when you create a consumer, which is a very difficult task since it is overly complicated. To create a consumer in Apache Kafka, a person needs to have a very strong knowledge of the internal functioning of Apache. I feel that Kaka needs to provide a consumer so that its users don't spend time in the creation of consumers. In general, Apache Kafka must provide users with a more user-friendly UI.
Maintaining and configuring Apache Kafka can be challenging, especially when you want to fine-tune its behavior. It involves configuring traffic partitioning, understanding retention times, and dealing with various variables. Monitoring and optimizing its behavior can also be difficult. Perhaps a more straightforward approach could be using messaging queues instead of the publish-subscribe pattern. Some solutions may not require the complex features of Apache Kafka, and a messaging queue with Kafka's capabilities might provide a more complete messaging solution for events and messages.
Kafka is a new method we opted to apply to our need for data exchange. Also, we use the solution's integration capabilities. Irovement-wise, I would like the solution to have more integration capabilities. Also, the solution's setup, which is currently complex, should be made easier.
One of the major areas for improvement, which I have to check out, is their pulling mechanism. Sometimes, when the data volume is too huge, and I have a pulling period of, let's say, one minute, there can be issues due to technical glitches, data anomalies, or platform-related issues such as cluster restarts. These polling periods tend to stop messaging use, and the restart ability part needs to be improved, especially when data volumes are too high. If there are obstructions due to technical glitches or platform issues, sometimes we have to manually clean up or clear the queue before it eventually gets sealed. It doesn't mean it doesn't get restarted on its own, but it takes too much time to catch up. At that point, one year ago, I couldn't find a solution to make it more agile in terms of catching up quickly and showing that it is real-time in case of any downtime. This was one area where I couldn't find a solution when I connected with Cloudera and Apache. One of our messaging tools was sending a couple of million records. We found it tough when there were any cluster downtimes or issues with the subscribers consuming data. For future releases, one feature I would like to see is a more robust solution in terms of restart ability. It should be able to handle platform issues and data issues and restart seamlessly. It should not cause a cascading effect if there is any downtime. Another feature that would be helpful is if they could add monitoring features as they have for their other services. A UI where I can monitor the capacity of the managed queue and resources I need to utilize more to make it ready for future data volumes. It would be great to have analytics on the overall performance of Kafka to plan for data volumes and messaging use. Currently, we plan the cluster resources manually based on data volumes for Kafka. If they can have a UI for resource planning based on data volume, that could be a great addition.
There are some latency problems with Kafka.
There have been some challenges with monitoring Apache Kafka, as there are currently only a few production-grade solutions available, which are all under enterprise license and therefore not easily accessible. The speaker has not had access to any of these solutions and has instead relied on tools, such as Dynatrace, which do not provide sufficient insight into the Apache Kafka system. While there are other tools available, they do not offer the same level of real-time data as enterprise solutions. One additional area that I think could benefit from improvement is the deployment process on OpenShift. This particular deployment is quite challenging and requires the activation of certain security measures as well as integration with other systems. It's not a straightforward process and typically requires engineers who are highly skilled and have extensive experience with Apache Kafka to carry out these tasks. Therefore, I believe that there is a need for progress in this area, and some tools that can provide information, assistance, and help make the whole process easier would be greatly appreciated.
Apache Kafka can improve by adding a feature out of the box which allows it to deliver only one message.
The interface has room for improvement, and there is a steep learning curve for Hadoop integration. It was a struggle learning to send from Hadoop to Kafka. In future releases, I'd like to see improvements in ETL functionality and Hadoop integration.
Kafka contains two components. The component that does the synchronization between the rest of the components, that's an older version of the software and it causes all kinds of configuration problems. The Confluent, which is the company that sells a commercial version of Kafka is getting away from that component precisely because of that. Kafka is a nightmare to administer. In the next release, I would like to see that one troublesome component that causes configuration issues removed.
The solution can improve its cloud support.
Apache Kafka could improve data loss and compatibility with Spark.
I would like them to reduce the learning curve around the creation of brokers and topics. They also need to improve on the concept of the partitions. As for features, RabbitMQ has an instant response feature where you can send a queue and get an instant response, but Kafka only has one way to send queues. If that's something they could improve on, it would be great.
We struggled a bit with the built-in data transformations because it was a challenge to get them up and running the way we wanted. There was a bit of a learning curve. It may be that we didn't fully grasp the information. Also, the documentation covering certain aspects was a bit poor. We had to trawl around different locations to try to find what we needed. When we were able to find documentation on transformation, for example, there wasn't a good set of documentation examples we could use, and the examples we had weren't quite meeting the need. Better examples would've helped us.
Apache Kafka can improve by providing a UI for monitoring. There are third-party tools that can do it, but it would be nice if it was already embedded within Apache Kafka.
I would like to see an improvement in authentication management.
When compared to other commercial competitors, Kafka doesn't have the ability to scale down, the elasticity is lacking in the product. The other issue for us is the delayed queue, which was available to us in the commercial software but not in Kafka. It's something we use in most of our applications for deferred processing and I know it's available in other solutions. I'd like to see some tooling support and language support in the open source version.
The user interface is one weakness. Sometimes, our data isn't as accessible as we'd like. It takes a lot of work to retrieve the data and the index.
To store a large set of analytical data we are using SQL repository. This type of repository works very well but we need specific and high maintenance. The user experience is friendly. We are looking for alternative solutions, we tried with noSQL solutions and Confluent specific features but the results were not satisfactory both in terms of performance and usability. We are working on automated SQL repository management and maintenance tools in order to increase the democratization of our platform.
The management overhead is more compared to the messaging system. There are challenges here and there. Like for long usage, it requires restarts and nodes from time to time.
Kafka has a lot of monitors, but sometimes it's most important to just have a simple monitor. Improvements to Kafka's management would be nice, but it's not so necessary for me. There are a lot of consoles that offer a better view than Kafka. Some are free, and some are paid, but I'm thinking about streaming. For example, if you connect more streams to a component in the same queue, how will it integrate to recognize the flow and the message?
More Windows support, I believe, is one area where it can improve. We need to wrap it as a service, but there isn't one built into Windows. So that's something they could improve. I believe Windows Server is primarily aimed at the Windows shop or those who use Windows.
The management tool could be improved.
We are still on the production aspect, with our service provider or hyper-scalers providing the solutions. I would like to see some improvement on the HA and DR solutions, where everything is happening in real-time. Kafka's interface could also use some work. Some of our products are in C, and we don't have any libraries to use with C. From an interface perspective, we had a library from the readies. And we are streaming some of the products we built to readies. That is one of the requirements. It would be good to have those libraries available in a future release for our C++ clients or public libraries, so we can include them in our product and build on that.
Kafka can allow for duplicates, which isn't as helpful in some of our scenarios. They need to work on their duplicate management capabilities but for now developers should ensure idempotent operations for such scenarios. While the solution scales well and easily, you need to understand your future needs and prep for the peaks.
I would like to see real-time event-based consumption of messages rather than the traditional way through a loop. The traditional messaging system works by listing and looping with a small wait to check to see what the messages are. A push system is where you have something that is ready to receive a message and when the message comes in and hits the partition, it goes straight to the consumer versus the consumer having to pull. I believe this consumer approach is something they are working on and may come in an upcoming release. However, that is message consumption versus message listening. Confluent created the KSQL language, but they gave it to the open-source community. I would like to see KSQL be able to be used on raw data versus structured and semi-structured data.
The graphical user environment is currently lacking in Apache. It's not available within the solution and needs to be built from scratch. Some of the open source products of this solution have limitations.
The initial setup and deployment could be less complex. Integration is one of the main concerns that we have.
Some vendors don't offer extra features for monitoring. Some come with Linux for default monitoring. Monitoring is very important. If something is not working properly, then our subscribers won't receive a notification. You then have to trace it back to Kafka and find the glitch or the messaging sequence that hasn't been racked up correctly. It should support Avro — which handles different data formats — as a default data format. It would be much more flexible if it did.
They need to have a proper portal to do everything because, at this moment, Kafka is lagging in this regard. It could be used to do the preprocessing or the configurations, instead of directly doing it on the queues or the topics. If you look at Solace, for example, they have come up with a portal where you don't need to touch these activities. You don't need to access the platform beyond the portal.
The model where you create the integration or the integration scenario needs improvement. It contains fewer developer words or maintaining words where someone prepared the topics, the connectors, or the streaming platforms. You would first need to have a control center from a third party for managing. If you would like to prepare something that is a more sophisticated integration scenario, where you use one microservice to provide the event or a second to several that consumed these microservices, then this needs to be modeled elsewhere. Also, when comparing to the traditional ESD for data mixing, you can create a scenario that could be deployed with inputs and some outputs. Most business like the topics, but for me, I think that it is a problem that messaging platforms have, there is no design tool with IDE for creating. It would be helpful to create a more complex solution for several types of styles, and not just for one provider or for one customer. That would be easier, but if you have more than one consumer then it could be a more complex scenario. It would be like events that go to several microservers to create orders, validate orders, and creating words. This would be helpful. In the next release, adding some IDE or developing tools, for creating better integration scenarios, even though it already a developer-oriented solution, would be helpful. It would also be helpful for the auto-deployment. Having a governance style would also be helpful to understand. It would be beneficial to have a repository of all of the topics, data types that exist, or data structures.
Due to the fact that the solution is open source, it has a zookeeper dependency. If I could change anything about the solution, it would be that. The solution could always add a few more features to enhance its usage.
This solution could be made easier to manage. Compatibility with other solutions and integration with other tools can be improved. We cannot apply all of our security requirements because it is hard to upload them.
The manageability should be improved. There are lots of things we need to manage and it should have a function that enables us to manage them all cohesively. There should be a default property. It's really hard to manage all these things.
More adapters for connecting to different systems need to be available.
Kafka is complex and there is a little bit of a learning curve.
We're still going through the solution. Right now, I can't suggest any features that might be missing. I don't see where there can be an improvement in that regard. The speed isn't as fast as RabbitMQ, even though the solution touts itself as very quick. It could be faster. They should work to make it at least as fast as RabbitMQ. The UI is based on command line. It would be helpful if they could come up with a simpler user interface. They should make it easier to configure items on the solution. The solution would benefit from the addition of better monitoring tools.
In the next release, I would like for there to be some authorization features and HTL security. We also need bigger software and better monitoring.
If the graphical user interface was easier for the Kafka administration it would be much better. Right now, you need to use the program with a command-line interface. If the graphical user interface was easier, it could be a better product.
Kafka does not provide control over the message queue, so we do not know whether we are experiencing lost or duplicate messages. Better control over the message queue would be an improvement. Solutions such as ActiveMQ do afford better control. Because of this, there is sometimes a gap in the results where we have either lost messages, or there are duplicates. We have had problems when there was an imbalance because all of the messages were being sent back.
There is a feature that we're currently using called MirrorMaker. We use it to combine the information from different Kafka servers into another server. It's very wide and it gives a very generic scenario. I think it would be great if the possibility would exist out of the box and not as a third party. The third party is not very stable and sometimes you have problems with this component. There are some developments in newer versions and we're about to try them out, but I'm not sure if it closes the gap.
Kafka 2.0 has been released for over a month, and I wanted to try out the new features. However, the configuration is a little bit complicated: Kafka Broker, Kafka Manager, ZooKeeper Servers, etc.