We don't have enough experience to be judgmental about its flaws, as we've only used stable features like batch micro-batch. Integration poses no problem; however, I don't use some features and can't judge those.
The product's event handling capabilities, particularly compared to Kaspersky, need improvement. Integrating event-level streaming capabilities could be beneficial. This aligns with the idea of expanding Spark's functionality to cover unaddressed areas, potentially enhancing its competitiveness.
Chief Data-strategist and Director at Theworkshop.es
Real User
Top 10
2024-01-25T11:39:24Z
Jan 25, 2024
In terms of improvement, the UI could be better. Additionally, Spark Streaming works well for various use cases, but improvements could be made for ultra-fast scenarios where seconds matter. While some business processes require real-time data every second, not all projects demand such speed. For instance, batch processing, short intervals for competitive intelligence, or operational intelligence actions might not need sub-second precision. Streaming is versatile but needs careful consideration based on the specific use case and problem at hand.
Apache Spark Streaming is a native integration of some libraries in terms of cost and load-related optimizations. The cost and load-related optimizations are areas where the tool lacks and needs improvement.
Chief Technology Officer at Teslon Technologies Pvt Ltd
Real User
Top 20
2023-06-08T10:44:00Z
Jun 8, 2023
In terms of disadvantages, it was a bit cumbersome due to its size. It wasn't quite cloud-native back then, meaning it wasn't easy to deploy it in a Kubernetes cluster and similar environments. I found it a bit challenging, but I'm not sure if that's still the case now. It probably has better support. It was on-prem when we wanted to migrate it to the cloud, especially on Kubernetes, I remember facing some difficulties in successfully migrating the system.
The service structure of Apache Spark Streaming can improve. There are a lot of issues with memory management and latency. There is no real-time analytics. We recommend it for the use cases where there is a five-second latency, but not for a millisecond, an IOT-based, or the detection anomaly-based. Flink as a service is much better. Apache Spark Streaming does not have auto-tuning. A customer needs to invest a lot, in terms of management and maintenance.
Chief Data-strategist and Director at Theworkshop.es
Real User
Top 10
2021-08-18T14:55:15Z
Aug 18, 2021
The installation is difficult. You definitely need more than one person. That said, if you are implementing the cloud, it's easier. The solution itself could be easier to use. The solution is free to use as it is open-source.
Chief Innovation & Technology Leader at a mining and metals company with 1,001-5,000 employees
Real User
2021-03-19T22:33:34Z
Mar 19, 2021
There could be an improvement in the area of the user configuration section, it should be less developer-focused and more business user-focused. For example, it is still not plug and play and use as some of the cloud offerings that come ready to use. It is not up there in the reading leading edge.
What is Streaming Analytics? Streaming analytics, also known as event stream processing (ESP), refers to the analyzing and processing of large volumes of data through the use of continuous queries. Traditionally, data is moved in batches. While batch processing may be an efficient method for handling huge pools of data, it is not suitable for time-sensitive, “in-motion” data that could otherwise be streamed, since that data can expire by the time it is processed. By using streaming...
We don't have enough experience to be judgmental about its flaws, as we've only used stable features like batch micro-batch. Integration poses no problem; however, I don't use some features and can't judge those.
The product's event handling capabilities, particularly compared to Kaspersky, need improvement. Integrating event-level streaming capabilities could be beneficial. This aligns with the idea of expanding Spark's functionality to cover unaddressed areas, potentially enhancing its competitiveness.
In terms of improvement, the UI could be better. Additionally, Spark Streaming works well for various use cases, but improvements could be made for ultra-fast scenarios where seconds matter. While some business processes require real-time data every second, not all projects demand such speed. For instance, batch processing, short intervals for competitive intelligence, or operational intelligence actions might not need sub-second precision. Streaming is versatile but needs careful consideration based on the specific use case and problem at hand.
Apache Spark Streaming is a native integration of some libraries in terms of cost and load-related optimizations. The cost and load-related optimizations are areas where the tool lacks and needs improvement.
In terms of disadvantages, it was a bit cumbersome due to its size. It wasn't quite cloud-native back then, meaning it wasn't easy to deploy it in a Kubernetes cluster and similar environments. I found it a bit challenging, but I'm not sure if that's still the case now. It probably has better support. It was on-prem when we wanted to migrate it to the cloud, especially on Kubernetes, I remember facing some difficulties in successfully migrating the system.
The initial setup is quite complex.
The service structure of Apache Spark Streaming can improve. There are a lot of issues with memory management and latency. There is no real-time analytics. We recommend it for the use cases where there is a five-second latency, but not for a millisecond, an IOT-based, or the detection anomaly-based. Flink as a service is much better. Apache Spark Streaming does not have auto-tuning. A customer needs to invest a lot, in terms of management and maintenance.
We would like to have the ability to do arbitrary stateful functions in Python.
The installation is difficult. You definitely need more than one person. That said, if you are implementing the cloud, it's easier. The solution itself could be easier to use. The solution is free to use as it is open-source.
There could be an improvement in the area of the user configuration section, it should be less developer-focused and more business user-focused. For example, it is still not plug and play and use as some of the cloud offerings that come ready to use. It is not up there in the reading leading edge.