We utilize IoT devices to gather data for our clients. This data is analyzed to produce reports and insights, and we leverage machine learning and artificial intelligence models.
Apache Flink is not a solution but a framework. Spark is a framework, not a tool. So, when dealing with real-time data processing and ETL use cases that require on-the-fly transformations, Apache Flink seems like a suitable choice as a framework. Apache Flink allows you to reduce latency and process data in real-time, making it ideal for such scenarios. I've worked on three large-scale platform use cases involving Apache Flink. One of those use cases handled a volume of approximately two hundred to 300,000,000 records per day. It translates to approximately 900 to 1000 records per second.
We used this solution for real-time analytics, and for identifying the outliers. We received live industrial data through the Kinesis data streams and Flink processed that data. We had two uses; through Lambda, Flink was putting into the S3 bucket and it was also able to put data into another S3 bucket, which was useful for our business analytics.
We predominantly use this solution on-premises but intend to migrate to some management services. Our primary use case for this solution is maintaining some pipelines which process data. We are currently integrating some pipelines from Pig to Spark and Pig to Flink.
Partner / Head of Data & Analytics at Intelligence Software Consulting
Real User
Top 5
2021-03-03T20:13:19Z
Mar 3, 2021
We use Apache Flink to monitor the network consumption for mobile data in fast, real-time data architectures in Mexico. The projects we get from clients are typically quite large, and there are around 100 users using Apache Flink currently. For maintenance and deployment, we split our team into two squads, with one squad that takes care of the data architecture and the other squad that handles the data analysis technology. Each squad is three members each.
Software Development Engineer III at a tech services company with 5,001-10,000 employees
Real User
2020-11-08T16:21:05Z
Nov 8, 2020
My company is a cab aggregator, similar to Uber in terms of scale as well. Just like Uber, we have two sources of real-time events. One is the customer mobile app, one is the driver mobile app. We get a lot of events from both of these sources, and there are a lot of things which you have to process in real-time and that is our primary use case of Flink. It includes things like surge pricing, where you might have a lot of people wanting to book a cab so the price increases and if there are fewer people, the price drops. All that needs to be done quickly and precisely. We also need to process events from drivers' mobile phones and calculate distances. It all requires a lot of data to be processed very quickly and in real-time.
Principal Software Engineer at a tech services company with 1,001-5,000 employees
Real User
2020-10-21T04:33:00Z
Oct 21, 2020
The last POC we did was for map-making. I work for a map-making company. India is one ADR and you have states within, you have districts within, and you have cities within. There are certain hierarchical areas. When you go to Google and when you search for a city within India, you would see the entire hierarchy. It's falls in India. We get third party sources, government sources, or we get it from different sources, if we can. We get the data, and this data is geometry. It's not a straightforward index. If we get raw geometry, we will get the entire map and the layout. We do geometry processing. Our POC was more of processing geometry in a distributed way. The exploration that I did was more about distributing this geometry and breaking this big geometry.
Sr. Software Engineer at a tech services company with 10,001+ employees
Real User
2020-10-19T09:33:00Z
Oct 19, 2020
Initially, we created our own servers and then eBay created their infrastructure. Now it's deployed on the eBay cloud. Our primary use case is trying to do real time aggregations/near-real time aggregations. Let's say for example that we are trying to do some count, sum,min,max distinct counts for different metrics that we care about, but we do this in real time. So let's say, you have an e-commerce company and you want to measure different metrics. If I take the example of risk, let's say you want to check if one particular seller on your site is doing something fishy or not. What is the behavior? How many listings do they have? In the past five minutes, one hour or one day or one year? You want to measure this over time. This data is very important to you from the business metric point of view. Often this data data is delayed by 1 day via offline analytics. You do ETL for these aggregations ,it's okay for offline business metrics. But when you want to do risk detection for online businesses, it needs to be right away in real time, and that's where those systems fail and where Apache Flink helps. And if combined with Lambda architecture, you can get them real time with the help of a parallel system that captures very latest data.
Lead Software Engineer at a tech services company with 5,001-10,000 employees
Real User
2020-10-13T07:21:29Z
Oct 13, 2020
Services that need real-time and fast updates as well as lot of data to process, flink is the way to go. Apache Flink with kubernetes is a good combination. Lots of data transformation grouping, keying, state mangements are some of the features of Flink. My use case is to provide faster and latest data as soon as possible in real time.
Sr Software Engineer at a tech vendor with 10,001+ employees
Real User
2020-10-13T07:21:29Z
Oct 13, 2020
We are using Flink as a pipeline for data cleaning. We are not using all of the features of Flink. Rather, we are using Flink Runner on top of Apache Beam. We are a CRM product-based company and we have a lot of customers that we provide our CRM for. We like to give them as much insight as we can, based on their activities. This includes how many transitions they do over a particular time. We do have other services, including machine learning, and so far, the resulting data is not very clean. This means that you have to clean it up manually. In real-time, working with Big Data in this circumstance is not very good. We use Apache Flink with Apache Beam as part of our data cleaning pipeline. It is able to perform data normalization and other features for clearing the data, which ultimately provides the customer with the feedback that they want. We also have a separate machine learning feature that is available, which can be optionally purchased by the customer.
Software Architect at a tech vendor with 501-1,000 employees
Real User
2020-10-07T07:04:00Z
Oct 7, 2020
We have our own infrastructure on AWS. We deploy Flink on Kubernetes Cluster in AWS. The Kubernetes cluster is managed by our internal Devops team. We also use Apache Kafka. That is where we get our event streams. We get millions of events through Kafka. There are more than 300K to 500K events per second that we get through that channel. We aggregate the events and generate reporting metrics based on the actual events that are recorded. There are certain real-time high-volume events that are coming through Kafka which are like any other stream. We use Flink for aggregation purposes in this case. So we read this high volume events from Kafka and then we aggregate. There is a lot of business logic running behind the scenes. We use Flink to aggregate those messages and send the result to a database so that our API layer or BI users can directly read from database.
Apache Flink is an open-source batch and stream data processing engine. It can be used for batch, micro-batch, and real-time processing. Flink is a programming model that combines the benefits of batch processing and streaming analytics by providing a unified programming interface for both data sources, allowing users to write programs that seamlessly switch between the two modes. It can also be used for interactive queries.
Flink can be used as an alternative to MapReduce for executing...
We use the solution to handle million of events per second in telecom. We use the mobile AT&T. It is very simple.
We utilize IoT devices to gather data for our clients. This data is analyzed to produce reports and insights, and we leverage machine learning and artificial intelligence models.
Apache Flink is not a solution but a framework. Spark is a framework, not a tool. So, when dealing with real-time data processing and ETL use cases that require on-the-fly transformations, Apache Flink seems like a suitable choice as a framework. Apache Flink allows you to reduce latency and process data in real-time, making it ideal for such scenarios. I've worked on three large-scale platform use cases involving Apache Flink. One of those use cases handled a volume of approximately two hundred to 300,000,000 records per day. It translates to approximately 900 to 1000 records per second.
We used this solution for real-time analytics, and for identifying the outliers. We received live industrial data through the Kinesis data streams and Flink processed that data. We had two uses; through Lambda, Flink was putting into the S3 bucket and it was also able to put data into another S3 bucket, which was useful for our business analytics.
We predominantly use this solution on-premises but intend to migrate to some management services. Our primary use case for this solution is maintaining some pipelines which process data. We are currently integrating some pipelines from Pig to Spark and Pig to Flink.
We use Apache Flink in-house to develop the Tectonic platform.
We use Apache Flink to monitor the network consumption for mobile data in fast, real-time data architectures in Mexico. The projects we get from clients are typically quite large, and there are around 100 users using Apache Flink currently. For maintenance and deployment, we split our team into two squads, with one squad that takes care of the data architecture and the other squad that handles the data analysis technology. Each squad is three members each.
I use the solution for detection of streaming data.
My company is a cab aggregator, similar to Uber in terms of scale as well. Just like Uber, we have two sources of real-time events. One is the customer mobile app, one is the driver mobile app. We get a lot of events from both of these sources, and there are a lot of things which you have to process in real-time and that is our primary use case of Flink. It includes things like surge pricing, where you might have a lot of people wanting to book a cab so the price increases and if there are fewer people, the price drops. All that needs to be done quickly and precisely. We also need to process events from drivers' mobile phones and calculate distances. It all requires a lot of data to be processed very quickly and in real-time.
The last POC we did was for map-making. I work for a map-making company. India is one ADR and you have states within, you have districts within, and you have cities within. There are certain hierarchical areas. When you go to Google and when you search for a city within India, you would see the entire hierarchy. It's falls in India. We get third party sources, government sources, or we get it from different sources, if we can. We get the data, and this data is geometry. It's not a straightforward index. If we get raw geometry, we will get the entire map and the layout. We do geometry processing. Our POC was more of processing geometry in a distributed way. The exploration that I did was more about distributing this geometry and breaking this big geometry.
Initially, we created our own servers and then eBay created their infrastructure. Now it's deployed on the eBay cloud. Our primary use case is trying to do real time aggregations/near-real time aggregations. Let's say for example that we are trying to do some count, sum,min,max distinct counts for different metrics that we care about, but we do this in real time. So let's say, you have an e-commerce company and you want to measure different metrics. If I take the example of risk, let's say you want to check if one particular seller on your site is doing something fishy or not. What is the behavior? How many listings do they have? In the past five minutes, one hour or one day or one year? You want to measure this over time. This data is very important to you from the business metric point of view. Often this data data is delayed by 1 day via offline analytics. You do ETL for these aggregations ,it's okay for offline business metrics. But when you want to do risk detection for online businesses, it needs to be right away in real time, and that's where those systems fail and where Apache Flink helps. And if combined with Lambda architecture, you can get them real time with the help of a parallel system that captures very latest data.
Services that need real-time and fast updates as well as lot of data to process, flink is the way to go. Apache Flink with kubernetes is a good combination. Lots of data transformation grouping, keying, state mangements are some of the features of Flink. My use case is to provide faster and latest data as soon as possible in real time.
We are using Flink as a pipeline for data cleaning. We are not using all of the features of Flink. Rather, we are using Flink Runner on top of Apache Beam. We are a CRM product-based company and we have a lot of customers that we provide our CRM for. We like to give them as much insight as we can, based on their activities. This includes how many transitions they do over a particular time. We do have other services, including machine learning, and so far, the resulting data is not very clean. This means that you have to clean it up manually. In real-time, working with Big Data in this circumstance is not very good. We use Apache Flink with Apache Beam as part of our data cleaning pipeline. It is able to perform data normalization and other features for clearing the data, which ultimately provides the customer with the feedback that they want. We also have a separate machine learning feature that is available, which can be optionally purchased by the customer.
We have our own infrastructure on AWS. We deploy Flink on Kubernetes Cluster in AWS. The Kubernetes cluster is managed by our internal Devops team. We also use Apache Kafka. That is where we get our event streams. We get millions of events through Kafka. There are more than 300K to 500K events per second that we get through that channel. We aggregate the events and generate reporting metrics based on the actual events that are recorded. There are certain real-time high-volume events that are coming through Kafka which are like any other stream. We use Flink for aggregation purposes in this case. So we read this high volume events from Kafka and then we aggregate. There is a lot of business logic running behind the scenes. We use Flink to aggregate those messages and send the result to a database so that our API layer or BI users can directly read from database.