There are some kind of hard limits on Amazon Kinesis, and if you hit that, then you will get the throughput exceed error. We have to deal with and reduce how many consumers are hitting Amazon Kinesis Data Streams. We combined multiple jobs into a single job, so that there are no multiple jobs in the mainstream.
Senior Engineering Consultant at ASSURANCE IQ, INC.
Real User
Top 5
2024-06-24T07:46:44Z
Jun 24, 2024
I am not exactly sure about where improvements are needed in the tool. When I was working on the tool, it was very scalable, and the only thing we needed in our company was temporary streaming stuff that could work well. We didn't want to set up our own Kafka, other queues, or processing systems. As it is a cloud tool, it is easy for us to use the tool, and it satisfies all our requirements. Maybe for the other cases, if we need, then it may need some improvements. The tool satisfies our particular needs. Currently, the pipeline setup is very simple. For our particular use cases, it is because we just want to get the data and send it to the different data lakes or some logging system. Previously, we also used Amazon Kinesis to log those to Splunk, and later on, we removed Splunk and transferred that to Datadog. For our use cases, I don't want any new features in the tool. Amazon Kinesis' use case is for collecting, processing, and analyzing. If anything can be added to the tool, then I feel one should be able to use the same kind of tool so that everything is there in the product, like an alert system, and so that one can analyze, make a query, and do sourcing from the solution itself rather than using other logging and monitoring systems. The tool should focus on having an alert system rather than having to use a third-party solution. We can just get the data over Amazon Kinesis, and we can directly use all the benefits of current analytical tools, like in the areas involving BI, Looker, and Tableau. One would not need to buy the aforementioned tools, and we can just get started with Amazon Kinesis.
The solution currently provides an option to retrieve data in the stream or the queue, but it's not that helpful. We have to write some custom scripts to fetch data from there. An option to search for data in the queue can really help us in our day-to-day operations. Since the solution is a buffer system, you write to it and read from it. The readers are called consumers. If you want to run multiple consumers reading from the queue, you have to enable the enhanced fan-out feature on Amazon Kinesis. This enhanced fan-out feature is quite costly. There was a point when we had a huge budget increase in one week just because of the enhanced fan-out feature. This feature does not provide any special out-of-the-box functionality. Hence, we struggle to optimize multiple consumers reading from a single queue. We were charged high costs for the solution’s enhanced fan-out feature.
For me, especially with video streams, there's sometimes a kind of delay when the data has to be pumped to other services. This delay could be improved in Kinesis, or especially the Kinesis Video Streams, which is being used for different use cases for Amazon Connect. With that improvement, a lot of other use cases of Amazon Connect integrating with third-party analytic tools would be easier.
There are certain shortcomings in the machine learning capacity offered by the product, making it an area where improvements are required. There is a need to introduce something more into the machine learning area because it helps users learn and get newer things in their day-to-day lives. I think Amazon Kinesis should update machine learning and be up to the mark.
A snapshot from the stream of the data analytics I already have on the cloud. do a snapshot to stop the process and restart a few weeks later when I have more data or more availability of the client teams.
My company found some Amazon Kinesis discrepancies, so it's looking forward to a more modernized solution from Apache Kafka. One area for improvement in the solution is the file size limitation of 10 Mb. My company works with files with a larger file size. The batch size and throughput also need improvement in Amazon Kinesis. The solution needs to be more open regarding the type of files for streaming and the streaming size. Amazon should not limit those aspects. It should be unlimited. If a company is ready to pay, why not make it unlimited? What I want to add to Amazon Kinesis is modernization based on the container environment, where I can add containers and more workers. I also expect some human resources to be added and an SLA agreement with Amazon, if possible.
Currently, Kinesis provides only seven days of retention support. It would be beneficial if this could be extended to upwards of 40 days or more. In the next future release, I would like to see a library that is Java-compliant. It would be beneficial if Amazon Kinesis provided document-based support on the internet to be able to read the data from the Kinesis site.
One thing that would be nice would be a policy for increasing the number of Kinesis streams because that's the one thing that's constant. You can change it in real time, but somebody has to change it, or you have to set some kind of meter. So, auto-scaling of adding and removing streams would be nice. I'd like to see the size of a Kinesis message go to at least one megabyte per message. That would be nice, but that's an extreme case.
Chief Technology Officer at a tech services company with 51-200 employees
Real User
2021-08-25T19:54:29Z
Aug 25, 2021
Amazon Kinesis is not a bad product, but Azure Event Hub provides us with certain operational advantages, as our focus is on Microsoft related coding. This is why .NET is what we use at the backend. While we can use both Azure Event Hub and Amazon Kinesis towards this end, I feel the latter to be less customized or developed for use in connection with the server-less programming. Amazon Kinesis has a less meaningful and easy use than Azure Event Hub. Amazon Kinesis involved a more complex setup and configuration than Azure Event Hub.
Senior Software Engineer at a tech services company with 501-1,000 employees
Real User
2020-12-21T13:55:00Z
Dec 21, 2020
In general, the pain point for us was that once the data gets into Kinesis there is no way for us to understand what's happening because Kinesis divides everything into shards. So if we wanted to understand what's happening with a particular shard, whether it is published or not, we could not. Even with the logs, if we want to have some kind of logging it is in the shard. That is something that we thought we needed then, but later we realized that Kinesis was not built for that. They must have already improved by now, because I have not been in touch with AWS for the last five, six months since I joined this organization which uses Azure. I did not get to experiment with AWS Kinesis too much after that. It was built for something else, but we used Kinesis for one purpose and we were expecting a feature out of it that may not have really been the design of the service when they built Kinesis. It was almost like a black box for us, because once the data comes in we need to rely on the Lambda itself to let us know. Because if some Kinesis code is coming in, it processes that we will log back in using the Lambda. And that is where we would know, "Oh, okay this guy has come in, this guy has come in." We hoped for a better way of being able to track the shard being processed or how they streamed within Kinesis. We wanted to have a look at that, but that was not available then. It may not even be available now. We did not have the feature that we expected in the first place from Kinesis. Overall that was the only thing that we felt was lacking. Our use case may not have been the most ideal one, but other than that we did not have many qualms with Kinesis. Overall, we felt we would have simplified the entire design of what we did by simply using an SNS and SQS, because we have much better visibility in terms of tracking what happens within the SNS and SQS.
Chapter Lead - Data and Infrastructure (Head of Department) at a media company with 51-200 employees
Real User
2020-11-19T03:56:48Z
Nov 19, 2020
They recently expanded the feature sets, but when we were implementing it, it could only deliver to one platform. I'm not sure where it's at now but multiple platforms would be beneficial. I'd also like to have some ability to do first in, first out queuing. If I put several messages into Firehose, there's no guarantee that everything will be processed in the order it was sent.
The automation could be better. The solution needs to be better at information capture. Some jobs have limitations which can make the process a bit challenging. In order to do a successful setup, the person handling the implementation needs to know the solution very well. You can't just come into it blind and with little to no experience.
Senior Engineering Consultant at a tech services company with 201-500 employees
Real User
2020-10-28T15:29:13Z
Oct 28, 2020
I'm currently trying to figure out production rates and consumption rates for data. If there were better documentation on optimal sharding strategies then it would be helpful.
Kinesis Data Analytics needs to be improved somewhat. It's SQL based data but it is not as user friendly as MySQL or Athena tools. That's the one improvement that I'm expecting from Amazon. Apart from that everything is fine.
In terms of what can be improved, I would say that within Data Streams, you have a variety of ways to interact with the data; you have the Kinesis client library, the KCL, and you have the Kinesis agent. When we were developing our architecture a couple years back, all the libraries to aggregate the data were very problematic. So the Kinesis Aggregator, which essentially improves the performance and cost by aggregating individual records into bigger one, is something that I found had a lot of room for improvement to make it a lot more refined. At the time I found a couple of limitations that I had to work around. So definitely on that side I found room for improvement. Something else to mention is that we use Kinesis with Lambda a lot and the fact that you can only connect one Stream to one Lambda, I find is a limiting factor. I would definitely recommend to remove that constraint.
Senior Software Engineer at a computer software company with 201-500 employees
Real User
2020-10-20T04:19:00Z
Oct 20, 2020
The default limit that they have, which at the moment is 5,000 records per second (I'm talking about Kinesis Firehose which is a specialized form of the Amazon Kinesis service) seems too low. Actually, on the first week that we deployed it into production, we had to roll it back and ask Amazon to increase the default limits. It's mentioned in the documentation, but I think the default settings are far too low. The first week it was extremely slow because the records were not properly ingested in the stream, so we had to try it again. This happened the first week that we deployed it into production, but after talking with Amazon, they increased their throttling limits up to 10,000 records. Now it works fine.
Principal Data Engineer at a transportation company with 1,001-5,000 employees
Real User
2020-10-15T11:35:04Z
Oct 15, 2020
I would say that the solution probably has the capability to do sharding so that you can do a lot of things in parallel. I think that the way the sharding works could be simplified and include features that make it easier to scale in a parallel way.
Amazon Kinesis makes it easy to collect, process, and analyze real-time, streaming data so you can get timely insights and react quickly to new information. Amazon Kinesis offers key capabilities to cost-effectively process streaming data at any scale, along with the flexibility to choose the tools that best suit the requirements of your application. With Amazon Kinesis, you can ingest real-time data such as video, audio, application logs, website clickstreams, and IoT telemetry data for...
There are some kind of hard limits on Amazon Kinesis, and if you hit that, then you will get the throughput exceed error. We have to deal with and reduce how many consumers are hitting Amazon Kinesis Data Streams. We combined multiple jobs into a single job, so that there are no multiple jobs in the mainstream.
I am not exactly sure about where improvements are needed in the tool. When I was working on the tool, it was very scalable, and the only thing we needed in our company was temporary streaming stuff that could work well. We didn't want to set up our own Kafka, other queues, or processing systems. As it is a cloud tool, it is easy for us to use the tool, and it satisfies all our requirements. Maybe for the other cases, if we need, then it may need some improvements. The tool satisfies our particular needs. Currently, the pipeline setup is very simple. For our particular use cases, it is because we just want to get the data and send it to the different data lakes or some logging system. Previously, we also used Amazon Kinesis to log those to Splunk, and later on, we removed Splunk and transferred that to Datadog. For our use cases, I don't want any new features in the tool. Amazon Kinesis' use case is for collecting, processing, and analyzing. If anything can be added to the tool, then I feel one should be able to use the same kind of tool so that everything is there in the product, like an alert system, and so that one can analyze, make a query, and do sourcing from the solution itself rather than using other logging and monitoring systems. The tool should focus on having an alert system rather than having to use a third-party solution. We can just get the data over Amazon Kinesis, and we can directly use all the benefits of current analytical tools, like in the areas involving BI, Looker, and Tableau. One would not need to buy the aforementioned tools, and we can just get started with Amazon Kinesis.
The solution currently provides an option to retrieve data in the stream or the queue, but it's not that helpful. We have to write some custom scripts to fetch data from there. An option to search for data in the queue can really help us in our day-to-day operations. Since the solution is a buffer system, you write to it and read from it. The readers are called consumers. If you want to run multiple consumers reading from the queue, you have to enable the enhanced fan-out feature on Amazon Kinesis. This enhanced fan-out feature is quite costly. There was a point when we had a huge budget increase in one week just because of the enhanced fan-out feature. This feature does not provide any special out-of-the-box functionality. Hence, we struggle to optimize multiple consumers reading from a single queue. We were charged high costs for the solution’s enhanced fan-out feature.
For me, especially with video streams, there's sometimes a kind of delay when the data has to be pumped to other services. This delay could be improved in Kinesis, or especially the Kinesis Video Streams, which is being used for different use cases for Amazon Connect. With that improvement, a lot of other use cases of Amazon Connect integrating with third-party analytic tools would be easier.
There are certain shortcomings in the machine learning capacity offered by the product, making it an area where improvements are required. There is a need to introduce something more into the machine learning area because it helps users learn and get newer things in their day-to-day lives. I think Amazon Kinesis should update machine learning and be up to the mark.
A snapshot from the stream of the data analytics I already have on the cloud. do a snapshot to stop the process and restart a few weeks later when I have more data or more availability of the client teams.
Amazon Kinesis should improve its limits.
The price is not much cheaper. So, there is room for improvement in the pricing.
Kinesis can be expensive, especially when dealing with large volumes of data.
My company found some Amazon Kinesis discrepancies, so it's looking forward to a more modernized solution from Apache Kafka. One area for improvement in the solution is the file size limitation of 10 Mb. My company works with files with a larger file size. The batch size and throughput also need improvement in Amazon Kinesis. The solution needs to be more open regarding the type of files for streaming and the streaming size. Amazon should not limit those aspects. It should be unlimited. If a company is ready to pay, why not make it unlimited? What I want to add to Amazon Kinesis is modernization based on the container environment, where I can add containers and more workers. I also expect some human resources to be added and an SLA agreement with Amazon, if possible.
Currently, Kinesis provides only seven days of retention support. It would be beneficial if this could be extended to upwards of 40 days or more. In the next future release, I would like to see a library that is Java-compliant. It would be beneficial if Amazon Kinesis provided document-based support on the internet to be able to read the data from the Kinesis site.
One thing that would be nice would be a policy for increasing the number of Kinesis streams because that's the one thing that's constant. You can change it in real time, but somebody has to change it, or you have to set some kind of meter. So, auto-scaling of adding and removing streams would be nice. I'd like to see the size of a Kinesis message go to at least one megabyte per message. That would be nice, but that's an extreme case.
Amazon Kinesis is not a bad product, but Azure Event Hub provides us with certain operational advantages, as our focus is on Microsoft related coding. This is why .NET is what we use at the backend. While we can use both Azure Event Hub and Amazon Kinesis towards this end, I feel the latter to be less customized or developed for use in connection with the server-less programming. Amazon Kinesis has a less meaningful and easy use than Azure Event Hub. Amazon Kinesis involved a more complex setup and configuration than Azure Event Hub.
In general, the pain point for us was that once the data gets into Kinesis there is no way for us to understand what's happening because Kinesis divides everything into shards. So if we wanted to understand what's happening with a particular shard, whether it is published or not, we could not. Even with the logs, if we want to have some kind of logging it is in the shard. That is something that we thought we needed then, but later we realized that Kinesis was not built for that. They must have already improved by now, because I have not been in touch with AWS for the last five, six months since I joined this organization which uses Azure. I did not get to experiment with AWS Kinesis too much after that. It was built for something else, but we used Kinesis for one purpose and we were expecting a feature out of it that may not have really been the design of the service when they built Kinesis. It was almost like a black box for us, because once the data comes in we need to rely on the Lambda itself to let us know. Because if some Kinesis code is coming in, it processes that we will log back in using the Lambda. And that is where we would know, "Oh, okay this guy has come in, this guy has come in." We hoped for a better way of being able to track the shard being processed or how they streamed within Kinesis. We wanted to have a look at that, but that was not available then. It may not even be available now. We did not have the feature that we expected in the first place from Kinesis. Overall that was the only thing that we felt was lacking. Our use case may not have been the most ideal one, but other than that we did not have many qualms with Kinesis. Overall, we felt we would have simplified the entire design of what we did by simply using an SNS and SQS, because we have much better visibility in terms of tracking what happens within the SNS and SQS.
They recently expanded the feature sets, but when we were implementing it, it could only deliver to one platform. I'm not sure where it's at now but multiple platforms would be beneficial. I'd also like to have some ability to do first in, first out queuing. If I put several messages into Firehose, there's no guarantee that everything will be processed in the order it was sent.
The automation could be better. The solution needs to be better at information capture. Some jobs have limitations which can make the process a bit challenging. In order to do a successful setup, the person handling the implementation needs to know the solution very well. You can't just come into it blind and with little to no experience.
I'm currently trying to figure out production rates and consumption rates for data. If there were better documentation on optimal sharding strategies then it would be helpful.
Kinesis Data Analytics needs to be improved somewhat. It's SQL based data but it is not as user friendly as MySQL or Athena tools. That's the one improvement that I'm expecting from Amazon. Apart from that everything is fine.
Kinesis is good for Amazon Cloud but not as suitable for other cloud vendors.
In terms of what can be improved, I would say that within Data Streams, you have a variety of ways to interact with the data; you have the Kinesis client library, the KCL, and you have the Kinesis agent. When we were developing our architecture a couple years back, all the libraries to aggregate the data were very problematic. So the Kinesis Aggregator, which essentially improves the performance and cost by aggregating individual records into bigger one, is something that I found had a lot of room for improvement to make it a lot more refined. At the time I found a couple of limitations that I had to work around. So definitely on that side I found room for improvement. Something else to mention is that we use Kinesis with Lambda a lot and the fact that you can only connect one Stream to one Lambda, I find is a limiting factor. I would definitely recommend to remove that constraint.
The default limit that they have, which at the moment is 5,000 records per second (I'm talking about Kinesis Firehose which is a specialized form of the Amazon Kinesis service) seems too low. Actually, on the first week that we deployed it into production, we had to roll it back and ask Amazon to increase the default limits. It's mentioned in the documentation, but I think the default settings are far too low. The first week it was extremely slow because the records were not properly ingested in the stream, so we had to try it again. This happened the first week that we deployed it into production, but after talking with Amazon, they increased their throttling limits up to 10,000 records. Now it works fine.
I would say that the solution probably has the capability to do sharding so that you can do a lot of things in parallel. I think that the way the sharding works could be simplified and include features that make it easier to scale in a parallel way.