The main feature that we find valuable is that it is very fast. In terms of big data, the main feature is that the data is in so many different nodes. It goes through many data nodes so whenever we use the data, it enables us to parse the data from different data nodes.
Lead Consultant at a tech services company with 51-200 employees
The data storage capacity means we can inject somewhere in the user database in more efficient ways
Pros and Cons
- "The main feature that we find valuable is that it is very fast."
- "We use big data manager but we cannot use it as conditional data so whenever we're trying to fetch the data, it takes a bit of time."
What is most valuable?
What needs improvement?
We use big data manager but we cannot use it as conditional data so whenever we're trying to fetch the data, it takes a bit of time. There is some latency in the system and latency in the data caching. The main issue is that we need to design it in a way that data will be available to us very quickly. It takes a long time and the latest data should be available to us much quicked.
What do I think about the stability of the solution?
We don't have any problems with stability.
How are customer service and support?
I'm not the one who would contact their support if we needed it.
Buyer's Guide
Apache Spark
April 2025

Learn what your peers think about Apache Spark. Get advice and tips from experienced pros sharing their opinions. Updated: April 2025.
849,190 professionals have used our research since 2012.
How was the initial setup?
The initial setup is straightforward.
What other advice do I have?
The advice that I would give to someone considering this solution is that the quality of data has key streaming capabilities like velocity. This means how quickly you are going to refer to the data. These things matter by designing the solution. We need to take these things out.
I would rate Apache Spark an eight out of ten.
To make it a ten they should improve the speed. The data storage capacity means we can inject somewhere in the user database in more efficient ways.
Disclosure: I am a real user, and this review is based on my own experience and opinions.

Senior Software Architect at USEReady
Handles both batch and streaming data efficiently for real-time processing
What is our primary use case?
I use Apache Spark for any data engineering part. I handle some computation processes where it is necessary to process big data.
What is most valuable?
Apache Spark's ability to handle both batch and streaming data is the most valuable feature for me. It is beneficial for consuming real-time data. It offers solid real-time processing capability, making it more efficient in managing data analytics. It is beneficial as it allows processing of both batch and streaming data seamlessly.
What needs improvement?
There is complexity when it comes to understanding the whole ecosystem, especially for beginners. I find it quite complex to understand how a Spark job is initiated, the roles of driver nodes, worker nodes, stages, and tasks. Additionally, clustering may be a bit complex to set up.
For how long have I used the solution?
I have been using Apache Spark for about two and a half years now.
What was my experience with deployment of the solution?
Clustering may be a bit complex to set up, but it depends on the experience that the person involved has.
What do I think about the stability of the solution?
I find Apache Spark to be fine and stable.
What do I think about the scalability of the solution?
The scalability of Apache Spark depends on the number of machines being used. By adjusting the worker and driver nodes, scaling can be leveraged.
How are customer service and support?
I haven't tried Apache Spark's official support. I mostly use ChatGPT for assistance.
How would you rate customer service and support?
Neutral
Which solution did I use previously and why did I switch?
Before Spark, I used solutions like Storm and Flume. In AWS, it is Kinesis. Both Kinesis and Spark have their own ways of managing data injection and compute.
How was the initial setup?
In the public cloud, it comes with built-in services, but for on-premises, I have to spin up my own cluster using my ecosystem.
What about the implementation team?
A single person can handle installation if they are capable enough.
What was our ROI?
Timing depends on the cluster being used, such as how many compute nodes and the kind of data there is. So, it's not straightforward to specify the percentage of time and money saved.
What's my experience with pricing, setup cost, and licensing?
Apache Spark is open-source, so it doesn't incur any charges.
Which other solutions did I evaluate?
I used Storm, Flume, and AWS Kinesis.
What other advice do I have?
I rate my overall experience with Apache Spark as eight out of ten. I suggest leveraging AI capabilities to enhance performance or check for anomalies.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Other
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Last updated: Apr 24, 2025
Flag as inappropriate
Buyer's Guide
Download our free Apache Spark Report and get advice and tips from experienced pros
sharing their opinions.
Updated: April 2025
Popular Comparisons
Amazon EMR
Cloudera Distribution for Hadoop
Spark SQL
IBM Spectrum Computing
Informatica Big Data Parser
IBM Db2 Big SQL
Buyer's Guide
Download our free Apache Spark Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:
- Which is the best RDMBS solution for big data?
- Apache Spark without Hadoop -- Is this recommended?
- Which solution has better performance: Spring Boot or Apache Spark?
- AWS EMR vs Hadoop
- Handling real and fast data - how do BigInsight and other solutions perform?
- When evaluating Hadoop, what aspect do you think is the most important to look for?
- Should we choose InfoSphere BigInsights or Cloudera?