What is our primary use case?
Apache Kafka is used for more than only a messaging bus but also served as a database to store information. It functioned as a streamer, similar to ETL, to manipulate and transform events before migrating them to other systems for use. The database could also act as a cache. Apache Kafka is used as a database broker, streamer, and source of truth for multiple systems due to its ability to maintain events for at least 10 days. It provided both synchronous and asynchronous communication, making it a complex system that would be easier to understand through diagrams or sketches.
We use reactive frameworks.
How has it helped my organization?
From my experience with Apache Kafka, one of the most notable advantages is its ability to maintain a comprehensive record of historical data that includes every update, alteration, and version of information, unlike a conventional relational database. This feature allows for seamless tracking and analysis of the progression and transformation of the data over time, enabling users to easily review and analyze the history of the information.
The solution has the capability for various systems to effortlessly interact with one another without prior knowledge of their existence, current operational status, or specific configurations. By utilizing service buses and dynamic integration, data can be distributed across networks and retrieved in a way that is most suitable for each system's requirements. In addition, Apache Kafka allows for the modification of data to provide diverse clients, consumers, or observers with unique and varying data. The replication of data can produce multiple versions, and this data can be adjusted to fit various needs. With the use of probes, one can alter the behavior of the transformation process, thereby changing the way in which data is transformed and the output produced. Overall, working with Apache Kafka has brought about an array of benefits, enabling seamless system interactions and allowing for the customization and modification of data to meet individual requirements.
What is most valuable?
There are numerous possibilities that can be explored. While it may be challenging to fully comprehend the potential advantages, one key aspect is the ability to establish a proper sequence of events rather than simply dealing with a jumbled group of occurrences. These events possess their own timestamps, even if they were not initially provided with one, and are arranged in a chronological order that allows for a clear understanding of the progression of the events.
What needs improvement?
There have been some challenges with monitoring Apache Kafka, as there are currently only a few production-grade solutions available, which are all under enterprise license and therefore not easily accessible. The speaker has not had access to any of these solutions and has instead relied on tools, such as Dynatrace, which do not provide sufficient insight into the Apache Kafka system. While there are other tools available, they do not offer the same level of real-time data as enterprise solutions.
One additional area that I think could benefit from improvement is the deployment process on OpenShift. This particular deployment is quite challenging and requires the activation of certain security measures as well as integration with other systems. It's not a straightforward process and typically requires engineers who are highly skilled and have extensive experience with Apache Kafka to carry out these tasks. Therefore, I believe that there is a need for progress in this area, and some tools that can provide information, assistance, and help make the whole process easier would be greatly appreciated.
For how long have I used the solution?
I have been using Apache Kafka for approximately four years.
What do I think about the stability of the solution?
The solution is stable if you have set it up correctly.
What do I think about the scalability of the solution?
Apache Kafka is a scalable solution.
How are customer service and support?
I have not escalated any questions to technical support because Apache Kafka is an open-source system. However, Confluent and other companies sell support and enterprise solutions to make it more convenient and streamline the work. They offer tools, such as a monitoring tool with a visual interface, which provides a lot of information and buttons to press for correction or change without touching the code. Each of those buttons hypothetically could have helped the situation, but it is unclear what they do exactly, it is best to call the data center and ask. If you buy their service, you have access to all the enterprise comforts.
How was the initial setup?
Setting up Apache Kafka is, is not an easy task, especially when trying to containerize it and make it controllable. This is because Apache Kafka has its own distributed mechanism for staying alive, checking readiness, replicating, and scaling. Ensuring that it complies with Kubernetes or OpenShift Orchestrator requires careful attention, as there is a risk of two masters attempting to perform the same task and ultimately undoing each other's work.
In comparison to Kubernetes, OpenShift is a highly skilled and advanced implementation infrastructure that automatically manages and orchestrates all the steps required for an application setup. It operates at a higher level of abstraction and eliminates the need for manual operations that are required with Kubernetes. While Kubernetes can run an application with some pipeline and configuration, OpenShift takes care of everything from finding the required images to creating ports and connecting databases. Although manual changes can be made, it's not necessary as OpenShift offers a much more course-grained management approach.
What about the implementation team?
One skillful DevOps engineer can implement the solution.
What's my experience with pricing, setup cost, and licensing?
Apache Kafka is an open-source solution.
What other advice do I have?
The maintenance of Apache Kafka is crucial due to the complexity of the system with numerous microservices and systems communicating through Apache Kafka, requiring proper integration and configuration to prevent overloading and ensure a healthy cluster. The task is not easy and requires knowledge of the various adjustable parameters, as misadjusting even one of them can greatly slow down the cluster. For example, if the consumer group changes frequently, the messages must be regrouped and reassigned, causing significant delays. Therefore, configuring Apache Kafka correctly is essential to avoid high latency issues.
I would strongly suggest others give Apache Kafka a chance and explore the various advantages that it can offer, especially since it should not be perceived as a message bus or broker but rather an enterprise bus designed for data manipulation. It has the ability to transform data, store and reject it, and even maintain different versions of the same data simultaneously. Moreover, it operates on a pull mechanism rather than a push mechanism, which takes away the risk of losing data and places the responsibility for data loss on the consumer. On the other hand, it also ensures that the data is always available within the specified window and allows for easy replication of the past, which is extremely helpful in situations such as those involving a hacked bank database. With Apache Kafka, you can efficiently go back in time, obtain the required status and events, and make changes accordingly, without the need to go through each transaction separately. Thus, using this solution can make data management much more efficient and convenient.
I rate Apache Kafka an eight out of ten.
In order to improve its user-friendliness, engineer-friendliness, and DevOps-friendliness, the system must undertake various tasks, such as enhancing the overall operation and configuration, ensuring seamless integration with other systems, and adapting to security layers in a more comprehensive and generic manner. This will require significant efforts to make the system more functional, secure, and efficient.
Disclosure: I am a real user, and this review is based on my own experience and opinions.