Apache Flink Reviews and Pricing

Lead Data Scientist at a transportation company with 51-200 employees

Dec 15, 2023

While providing powerful stream and batch processing capabilities, it needs improvement in command automation and stability

Pros and Cons

"It provides us the flexibility to deploy it on any cluster without being constrained by cloud-based limitations."

"There is room for improvement in the initial setup process."

What is our primary use case?

We use it for batch processing, specifically for hiding certain data, including location information and other entity-specific attributes.

What is most valuable?

The significant advantage is the learning curve as it is easy. It's not associated with any specific cloud entity. It provides us the flexibility to deploy it on any cluster without being constrained by cloud-based limitations. It enables freedom for making optimizations, allowing for specific customizations as needed.

What needs improvement?

There is room for improvement in the initial setup process. I found myself spending a significant amount of time navigating through documentation to configure it. It would be beneficial to have streamlined commands, where the environment could be quickly initialized, including database setup, providing a more efficient and convenient starting point for projects. This would be particularly advantageous for development and experimentation, allowing more focus on feature testing rather than spending time on the setup process.

For how long have I used the solution?

I have been working with it for one year.

Buyer's Guide

Apache Flink

April 2025

Free Report: Apache Flink Reviews and More

Learn what your peers think about Apache Flink. Get advice and tips from experienced pros sharing their opinions. Updated: April 2025.

DOWNLOAD NOW

847,862 professionals have used our research since 2012.

What do I think about the stability of the solution?

In the tech industry, there's always room for enhancements. It's a dynamic environment where products evolve over time. I would rate it eight out of ten.

What do I think about the scalability of the solution?

I would rate its scalability capabilities eight out of ten. Currently, a team of fifteen individuals is involved in everyday processes.

How are customer service and support?

Regarding tech support, the challenge lies in the open-source realm where there's no inherent accountability. Modules are developed, and users often rely on forums for assistance. However, for more critical or specific problems, especially those related to compliance or complaints, a paid service is available with a dedicated support team. I would rate it seven out of ten.

How would you rate customer service and support?

Neutral

Which solution did I use previously and why did I switch?

We initially utilized Databricks for batch processing, which also supports real-time capabilities. However, due to limited computing power usage, we transitioned to a more focused approach. We now employ smaller containers for handling about five to six attributes specific to each entity in real time which allows us to experiment with alerts, actions, and responses for various purposes.

How was the initial setup?

The initial setup was complex.

What about the implementation team?

Deployment takes a total of two days: one day for learning and another day for the actual deployment process. We have a small dedicated team that manages updates and handles the networking aspects of the development website. I would rate it eight out of ten.

What's my experience with pricing, setup cost, and licensing?

It's an open source.

What other advice do I have?

I would give it a rating of seven out of ten due to the need for improvement in command automation and perhaps some stability concerns. However, overall, I would still recommend it.

Which deployment model are you using for this solution?

Private Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)

Disclosure: I am a real user, and this review is based on my own experience and opinions.

ZHIZHENG

Product Operations Manager at OKX

Mar 16, 2023

Download

Great data streaming tool but documentation needs to be more accessible

Pros and Cons

"Apache Flink's best feature is its data streaming tool."

"Apache Flink's documentation should be available in more languages."

What is most valuable?

Apache Flink's best feature is its data streaming tool.

What needs improvement?

Apache Flink's documentation should be available in more languages.

For how long have I used the solution?

I've been using Apache Flink for four to five years.

What do I think about the stability of the solution?

Apache Flink is quite stable.

What do I think about the scalability of the solution?

Apache Flink is scalable.

How was the initial setup?

The initial setup was easy, and the cloud-based version takes no more than a minute to deploy.

What other advice do I have?

I would recommend Apache Flink to other users and rate it seven out of ten.

Which deployment model are you using for this solution?

Hybrid Cloud

Disclosure: I am a real user, and this review is based on my own experience and opinions.

Buyer's Guide

Apache Flink

April 2025

Free Report: Apache Flink Reviews and More

Learn what your peers think about Apache Flink. Get advice and tips from experienced pros sharing their opinions. Updated: April 2025.

DOWNLOAD NOW

847,862 professionals have used our research since 2012.

Vinod Iyer

Principal Software Engineer at a tech services company with 1,001-5,000 employees

Oct 30, 2020

Download

Offers good API extraction and in-memory state management

Pros and Cons

"Apache Flink is meant for low latency applications. You take one event opposite if you want to maintain a certain state. When another event comes and you want to associate those events together, in-memory state management was a key feature for us."

"In terms of improvement, there should be better reporting. You can integrate with reporting solutions but Flink doesn't offer it themselves."

What is our primary use case?

The last POC we did was for map-making. I work for a map-making company. India is one ADR and you have states within, you have districts within, and you have cities within. There are certain hierarchical areas. When you go to Google and when you search for a city within India, you would see the entire hierarchy. It's falls in India.

We get third party sources, government sources, or we get it from different sources, if we can. We get the data, and this data is geometry. It's not a straightforward index. If we get raw geometry, we will get the entire map and the layout.

We do geometry processing. Our POC was more of processing geometry in a distributed way. The exploration that I did was more about distributing this geometry and breaking this big geometry.

How has it helped my organization?

Flink moved on to becoming a standard technology for location platform. There's only one location platform available right now by an open location platform. That platform leverages Flink. Because Flink is the component for streaming screen property. It's an optic whose data extends within the organization. Anywhere we need low latency applications, we use Flink.

What is most valuable?

Apache Flink is meant for low latency applications. You can take one event opposite if you want to maintain a certain state. When another event comes and you want to associate those events together, in-memory state management is a key feature for this.

Checkpointing was important since we have the consumption done by Kafka. There was a continuous pool of data coming in from cars and which was put into Kafka. This particular Apache Flink component came in and it started processing it. When there's a failure or something effective, checkpointing is very important.

It also helped us in exactly one standard. Another valuable feature is API extraction, which is nicely done. Anyone can understand it. It's not very complex. Anyone can go through all the transformations and everything they have. It's easy to use that. It's a well-balanced extraction.

What needs improvement?

In terms of improvement, there should be better reporting. You can integrate with reporting solutions but Flink doesn't offer it themselves.

They're more about the processing side. Low latency processing is out of their scope. As ar as low latency is concerned, you can integrate to other backend solutions as well. They have that flexibility. APIs are good enough. Its in-memory is so fast, you could have faster-developed data and stuff like that.

What do I think about the stability of the solution?

The stability was good enough. There are a few issues that were application dependent. From the processing standpoint, it did what it was expected to do as such, there were a few issues with Python integration like checkpointing. The checkpointing was not done properly at times, but again that was more about integration going faster and also optimizing our checkpointing intervals and stuff like that. As Flink is concerned, they have good checkpointing and safe points.

There were 50 developers and DevOps working on it.

How are customer service and technical support?

I was on the DevOps side. Support was all driven from Chicago. There was a different team in Chicago who was driving all this stuff. I was a completely hands-on developer. My interaction was more from using the API and developing applications. I don't need to use support. Flink was straightforward.

How was the initial setup?

The deployment can be declared on any kind of distributed managers. I haven't used it, but that's a good option that you could even integrate it within APIs. This adds flexibility to it.

I was not part of the deployment when it was initially done. When I came into the picture, it was more about the API. We had already started using it at the application level at the organization. initially, when it was done in my previous organization, it was an earlier version of Flink. I think they started off in 2016 and there might have been some glitches or some technical issues. When I came in it was pretty smooth. I didn't find any issues and really I hopped into Flink.

Which other solutions did I evaluate?

We also looked at Spark Streaming versus Apache. Spark Streaming is not real-time. That's where we understood that Flink is good enough when you want to have real-time Processing. That's the only process that we have right now and Spark Streaming is more of a big data set.

If you want real-deal real-time processing, you have to invest in Flink but part of it is more of when you use Flink, you have everything stored. You store the state also in-memory so you add up the cost of using that engine compared to PAC streaming. It's not mandatory it's up to the application but if you want to really have real-time processing if you want to store the state and if you really want to have a low latency application, that's when I would go with Flink. Whereas Spark streaming would be more of whenever it's okay to have a bit of a delay like really low latency applications.

Flink gives you flexibility. The reason we chose Spark was because people in our company were already familiar with it. We haven't started working on it yet because it's a half-done POC.

What other advice do I have?

Flink is really simple and simple to adopt. You can use any backend state management tools, like DB or something of that sort. it has the visibility to integrate with different technologies, that's also very important. It's pretty welded and I believe for low latency. The API is pretty well written that way to support you.

I would rate Apache Flink an eight out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.

reviewer1450275

Software Development Engineer III at a tech services company with 5,001-10,000 employees

Nov 10, 2020

Download

Provides truly real-time data streaming with better control over resources; ML library could be more flexible

Pros and Cons

"This is truly a real-time solution."

"The machine learning library is not very flexible."

What is our primary use case?

My company is a cab aggregator, similar to Uber in terms of scale as well. Just like Uber, we have two sources of real-time events. One is the customer mobile app, one is the driver mobile app. We get a lot of events from both of these sources, and there are a lot of things which you have to process in real-time and that is our primary use case of Flink. It includes things like surge pricing, where you might have a lot of people wanting to book a cab so the price increases and if there are fewer people, the price drops. All that needs to be done quickly and precisely. We also need to process events from drivers' mobile phones and calculate distances. It all requires a lot of data to be processed very quickly and in real-time.

How has it helped my organization?

The end-to-end latency was drastically reduced, and our capability of handling high throughput has increased by using Flink. It provides a lot of functionality with its windows and maps and that gives us a lot of extra features and power that other frameworks don't have. The solution has helped us by enabling a lot of creative features so we are now able to detect if something abnormal is happening, like a driver has deviated from the set route or the car has not moved for a long period of time, all in real time. Being able to check this has led to more secure rides for our customers.

What is most valuable?

The most valuable feature of Apache Flink would be that it is truly real-time. Unlike Spark and other technologies, it's not recurring batch processing and it also allows me better control over resources. For example, in Spark, it's very difficult to create multiple parallel streams and it consumes the memory of your entire cluster very greedily. With Flink, I have very good control, can choose the number of task managers with a fixed amount of memory, and configured parallelism. This flexibility is very useful in scaling of Flink.

What needs improvement?

Flink has become a lot more stable but the machine learning library is still not very flexible. There are some models which are not able to plug and play. In order to use some of the libraries and models, I need to have a Python library because there might be some pre-processing or post-processing requirements, or to even parse and use the models. The lack of Python support is something they can maybe work on in the future.

For how long have I used the solution?

I've been using this solution for two years.

What do I think about the stability of the solution?

The solution has become a lot more stable over time. We have around 10-12 users and most are software developers. Even if we are running our task managers on cheap servers, we make sure that our job manager definitely runs on a very expensive server, which never goes down. Things remain more stable that way. We're a large company and have teams dedicated to dealing with the infrastructure and taking care of the maintenance and infra, making sure that jobs runs smoothly. A small company could do the maintenance itself.

What do I think about the scalability of the solution?

It's a good product and allows me to scale very easily. If I'm getting more and more data, I can very easily increase the memory allocated for every task manager, and increase the number of parallelism. We are increasing usage of Flink as much as possible. There are some things that we still run on Spark, but whenever we need to scale, have easy resource management, and a more real-time streaming solution, we usually now go for Flink. We have never faced any scalability issues and we are running at a very high profile. I think we are yet to reach the scaling limits of Flink.

How are customer service and technical support?

We have never used Apache's tech support. We usually just Google for our questions. If we don't get the answers directly from Google, we go through the documentation, which is comprehensive, and usually find our answers there.

Which solution did I use previously and why did I switch?

We previously used Spark for streaming, but not for real time applications. We have moved some of our services from Spark to Flink. We also use Kafka extensively, but that is mostly for asynchronous communication between different services. Kafka is a totally different use case. You cannot substitute it with Flink. Overall in terms of streaming, we have used Spark, Kafka and Flink.

How was the initial setup?

The initial setup was not very straightforward because compared to other frameworks, Flink is quite new. There isn't yet a good online community, online blogs, or guidance. You have to rely more or less on their documentation for everything. Even if you go to Stack Overflow, for Spark, you will get lots of questions and answers which will help you. With Flink, you have to actually read a lot. It's not as straightforward as other frameworks.

What's my experience with pricing, setup cost, and licensing?

We have only used the open-source version of Flink.

What other advice do I have?

My advice would be to make sure you understand your requirements, flink's architecture, how it works and whether it is the right solution for you. They provide very good documentation which is useful. The solution isn't suitable for every case and it may be that Spark or some other framework is more suitable. If you are a major company that cannot afford any downtime, and given that Flink is a relatively new technology, it might be worthwhile investing in the monitoring. That would include writing scripts for monitoring and making sure that the throughput of the applications is always steady. Make sure your monitoring and your SOPs around monitoring, are in place.

I would rate this solution a seven out of 10.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)

Disclosure: I am a real user, and this review is based on my own experience and opinions.

Hitesh Baid

Lead Software Engineer at a tech services company with 5,001-10,000 employees

Oct 19, 2020

Download

Drastically reduces the turnaround/ processing time, Documentation is in depth and most of the things are straight forward.

Pros and Cons

"The event processing function is the most useful or the most used function. The filter function and the mapping function are also very useful because we have a lot of data to transform. For example, we store a lot of information about a person, and when we want to retrieve this person's details, we need all the details. In the map function, we can actually map all persons based on their age group. That's why the mapping function is very useful. We can really get a lot of events, and then we keep on doing what we need to do."

"The TimeWindow feature is a bit tricky. The timing of the content and the windowing is a bit changed in 1.11. They have introduced watermarks. A watermark is basically associating every data with a timestamp. The timestamp could be anything, and we can provide the timestamp. So, whenever I receive a tweet, I can actually assign a timestamp, like what time did I get that tweet. The watermark helps us to uniquely identify the data. Watermarks are tricky if you use multiple events in the pipeline. For example, you have three resources from different locations, and you want to combine all those inputs and also perform some kind of logic. When you have more than one input screen and you want to collect all the information together, you have to apply TimeWindow all. That means that all the events from the upstream or from the up sources should be in that TimeWindow, and they were coming back. Internally, it is a batch of events that may be getting collected every five minutes or whatever timing is given. Sometimes, the use case for TimeWindow is a bit tricky. It depends on the application as well as on how people have given this TimeWindow. This kind of documentation is not updated. Even the test case documentation is a bit wrong. It doesn't work. Flink has updated the version of Apache Flink, but they have not updated the testing documentation. Therefore, I have to manually understand it. We have also been exploring failure handling. I was looking into changelogs for which they have posted the future plans and what are they going to deliver. We have two concerns regarding this, which have been noted down. I hope in the future that they will provide this functionality. Integration of Apache Flink with other metric services or failure handling data tools needs some kind of update or its in-depth knowledge is required in the documentation. We have a use case where we want to actually analyze or get analytics about how much data we process and how many failures we have. For that, we need to use Tomcat, which is an analytics tool for implementing counters. We can manage reports in the analyzer. This kind of integration is pretty much straightforward. They say that people must be well familiar with all the things before using this type of integration. They have given this complete file, which you can update, but it took some time. There is a learning curve with it, which consumed a lot of time. It is evolving to a newer version, but the documentation is not demonstrating that update. The documentation is not well incorporated. Hopefully, these things will get resolved now that they are implementing it. Failure is another area where it is a bit rigid or not that flexible. We never use this for scaling because complexity is very high in case of a failure. Processing and providing the scaled data back to Apache Flink is a bit challenging. They have this concept of offsetting, which could be simplified."

What is our primary use case?

Services that need real-time and fast updates as well as lot of data to process, flink is the way to go. Apache Flink with kubernetes is a good combination. Lots of data transformation grouping, keying, state mangements are some of the features of Flink. My use case is to provide faster and latest data as soon as possible in real time.

How has it helped my organization?

The main advantage is the turnaround time, which has been reduced drastically because of Apache Flink. Earlier, it used to take a lot of processing time but now things have changed, and everything is in almost real time. We get the latest data in a very less time. There is no waiting or lag of data in the application, time has been one of the important factors.

The other factor is memory. The utilization of the machine has been more efficient since we started to use this solution. The big data applications definitely use a large group of machines to process the data. These machines are not optimally utilized. Some of the machines might not have been required, but they still take hold of the resources. In Kubernetes, we can provide resources, and in Apache Flink, there is a configuration where you can do the deployment in combination with a single cluster node. Scalability is quite flexible in flink with task managers and resource configuration.

What is most valuable?

MapFunction, FilterFunction are the most useful or the most used function in Flink. Data transformation becomes easy for example, Applications that store information about people and when they want to retrieve those person's details in some kind of relation, in the map function, we can actually filter all persons based on their age group. That's why the mapping function is very useful. This could be helpful in analytics to target specific news to specific age group.

What needs improvement?

TimeWindow feature. The timing of the content and the windowing is a bit changed in 1.11. They have introduced watermarks.

Watermark is basically associating data in the stream with a timestamp. Documentation can be referred. They have updated rest of the documentaion but not the testing documentation. Therefore, We have to manually try and understand few concepts.

Integration of Apache Flink with other metric services or failure handling data tools needs some kind of update or its in-depth knowledge is expected before integrating. Consider a use case where you want to actually analyze or get analytics about how much data you have processed and how many failed? Prometheus is one of the common metric tools out of the box supported by flink, along with other metric services. The documentation is straight forward. There is a learning curve with metric services, which can consume a lot of time, if not well versed with those tools.

Failure handling basic documentation is provided by flink, like restart on task failure, fixed delay restart...etc.

For how long have I used the solution?

I have been using Apache Flink for almost nine months.

What do I think about the stability of the solution?

The uptime of our services has increased, resources are better utilized, restarts is automated on failures, alerts are triggered when infrastructure breaches threshold and application failure metrics are logged, due to which the application has become robust, scaling is something which can be tweaked as per usage of application, business rules. Also Flink parameters are configurable, altogether it made our application more stable and maintainable.

What do I think about the scalability of the solution?

I haven't actually hit a lot of performance or stress test to see the scaling.
There is no detailed documentation for scaling. There is no fixed solution as well, it depends on use cases, there are rest APIs to scale your task dynamically in Flink Documentation, haven't personally tried it yet.

How are customer service and technical support?

I haven't actually used their technical support yet.

Which solution did I use previously and why did I switch?

I have tried batch processing, It was not that much effective in my use case. It was time consuming and not real time solution.

How was the initial setup?

The initial setup is straightforward. A new joined person (with some experience in software industry) would not find it that much difficult to understand and then contribute to the application. When you want to start writing the code for it, then things get tricky and more complex because if you are not involved with what Apache Flink is providing and what you need to do, it is very difficult. If followed the documentation, there are various examples provided and different deployment strategies as well.

What was our ROI?

It is good solution for time and cost saving.

What's my experience with pricing, setup cost, and licensing?

Being open source licensed, cost is not a factor. The community is strong to support.

What other advice do I have?

To get your hands wet on streaming or big data processing applications, to understand the basic concepts of big data processing and how complex analytics or complications can be made simple. For eg: If you want to analyze tweets or patterns, its a simple use case where you just use flink-twitter-connector and provide that as your input source to Flink. The stream of random tweets keeps on coming and then you can apply your own grouping, keying, filtering logic to understand their concepts.

An important thing I learned while using flink, is basic concepts of windowing, transformation, Data Stream API should be clear, or atleast be aware of what is going to be used in your application, or else you might end up increasing the time rather than decreasing. You should also understand your data, process, pipeline, flow, Is flink the right candidate for your architecture or an over kill?
It is flexible and powerful is all I can say.

Disclosure: I am a real user, and this review is based on my own experience and opinions.

reviewer1494531

Head of Data Science at a energy/utilities company with 10,001+ employees

Feb 7, 2021

Download

Easy deployment and install, open-source, but underdeveloped API

Pros and Cons

"The setup was not too difficult."

"In a future release, they could improve on making the error descriptions more clear."

What is our primary use case?

I use the solution for detection of streaming data.

What needs improvement?

I am using the Python API and I have found the solution to be underdeveloped compared to others. There needs to be better integration with notebooks to allow for more practical development. Additionally, there are no managed services. For example, on Azure, you would have to set everything up yourself.

In a future release, they could improve on making the error descriptions more clear.

For how long have I used the solution?

I have been using the solution for two weeks.

Which solution did I use previously and why did I switch?

I have used many different competing solutions in the past such as Pyspark, Hadoop and Storm.

How was the initial setup?

The setup was not too difficult.

What about the implementation team?

To deploy the solution locally it was relatively easy.

What's my experience with pricing, setup cost, and licensing?

The solution is open-source, which is free.

What other advice do I have?

When choosing this solution you have to look at your use case to see if this is the best choice for you. If you need to have super-fast realtime streaming, and you can develop in Scala, then it might make a lot of sense to use it. If you are looking at delays of seconds, and you are working on Python, then Pyspark might be a better solution.

I rate Apache Flink a six out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.

Agustin Calderon

CTO at ReNew

Feb 12, 2024

Download

Helps us to create both simple and complex data processing tasks

Pros and Cons

"The product helps us to create both simple and complex data processing tasks. Over time, it has facilitated integration and navigation across multiple data sources tailored to each client's needs. We use Apache Flink to control our clients' installations."

"Apache Flink should improve its data capability and data migration."

What is our primary use case?

We utilize IoT devices to gather data for our clients. This data is analyzed to produce reports and insights, and we leverage machine learning and artificial intelligence models.

What is most valuable?

The product helps us to create both simple and complex data processing tasks. Over time, it has facilitated integration and navigation across multiple data sources tailored to each client's needs. We use Apache Flink to control our clients' installations.

What needs improvement?

Apache Flink should improve its data capability and data migration.

For how long have I used the solution?

I have been using the product for five years.

What do I think about the stability of the solution?

The solution is stable.

What do I think about the scalability of the solution?

Apache Flink is scalable.

How was the initial setup?

The product's deployment can be completed in minutes, and we have a special team. It is straightforward. We initiate our requirements within our secure software and utilize Jenkins and our pipelines to carry out the deployment process, whether for expanding services on the cloud or on-premise servers.

What about the implementation team?

Apache Flink can be deployed in-house.

What other advice do I have?

I rate the product an eight out of ten.

Which deployment model are you using for this solution?

On-premises

Disclosure: I am a real user, and this review is based on my own experience and opinions.

Ertugrul Akbas

Manager at ANET

Jan 30, 2023

Download

Easy to use, stable, scalable, and has good community support with a lot of documentation

Pros and Cons

"It is user-friendly and the reporting is good."

"There is a learning curve. It takes time to learn."

What is our primary use case?

We use Apache Flink in-house to develop the Tectonic platform.

What is most valuable?

It's usable and affordable.

It is user-friendly and the reporting is good.

What needs improvement?

There is a learning curve. It takes time to learn.

The initial setup is complex, it could be simplified.

For how long have I used the solution?

I have been using Apache Flink for more than one year.

I am using the latest version.

What do I think about the stability of the solution?

Apache Flink is a scalable product. We have no issues with the stability.

What do I think about the scalability of the solution?

It's a very scalable solution. We have more than 100 people in our organization who are using it.

How are customer service and support?

We use community resources. There is a lot of documentation available online.

How was the initial setup?

The initial setup is complex.

What's my experience with pricing, setup cost, and licensing?

It's an open-source solution.

Which other solutions did I evaluate?

We have not evaluated competitors. We followed the trends and based on the experience and opinions of people from all over the world, we selected Apache Flink.

What other advice do I have?

I would recommend Apache Flink to others who are interested in using it.

I would rate this solution an eight out of ten.

Which deployment model are you using for this solution?

On-premises

Disclosure: I am a real user, and this review is based on my own experience and opinions.