Apache Spark Reviews and Pricing

it_user1223676

Lead Consultant at a tech services company with 51-200 employees

Jan 30, 2020

Download

The data storage capacity means we can inject somewhere in the user database in more efficient ways

Pros and Cons

"The main feature that we find valuable is that it is very fast."

"We use big data manager but we cannot use it as conditional data so whenever we're trying to fetch the data, it takes a bit of time."

What is most valuable?

The main feature that we find valuable is that it is very fast. In terms of big data, the main feature is that the data is in so many different nodes. It goes through many data nodes so whenever we use the data, it enables us to parse the data from different data nodes.

What needs improvement?

We use big data manager but we cannot use it as conditional data so whenever we're trying to fetch the data, it takes a bit of time. There is some latency in the system and latency in the data caching. The main issue is that we need to design it in a way that data will be available to us very quickly. It takes a long time and the latest data should be available to us much quicked.

What do I think about the stability of the solution?

We don't have any problems with stability.

How are customer service and support?

I'm not the one who would contact their support if we needed it.

Buyer's Guide

Apache Spark

April 2025

Free Report: Apache Spark Reviews and More

Learn what your peers think about Apache Spark. Get advice and tips from experienced pros sharing their opinions. Updated: April 2025.

DOWNLOAD NOW

849,190 professionals have used our research since 2012.

How was the initial setup?

The initial setup is straightforward.

What other advice do I have?

The advice that I would give to someone considering this solution is that the quality of data has key streaming capabilities like velocity. This means how quickly you are going to refer to the data. These things matter by designing the solution. We need to take these things out.

I would rate Apache Spark an eight out of ten.

To make it a ten they should improve the speed. The data storage capacity means we can inject somewhere in the user database in more efficient ways.

Disclosure: I am a real user, and this review is based on my own experience and opinions.

KamleshPant

Senior Software Architect at USEReady

Apr 24, 2025

Download

Handles both batch and streaming data efficiently for real-time processing

What is our primary use case?

I use Apache Spark for any data engineering part. I handle some computation processes where it is necessary to process big data.

What is most valuable?

Apache Spark's ability to handle both batch and streaming data is the most valuable feature for me. It is beneficial for consuming real-time data. It offers solid real-time processing capability, making it more efficient in managing data analytics. It is beneficial as it allows processing of both batch and streaming data seamlessly.

What needs improvement?

There is complexity when it comes to understanding the whole ecosystem, especially for beginners. I find it quite complex to understand how a Spark job is initiated, the roles of driver nodes, worker nodes, stages, and tasks. Additionally, clustering may be a bit complex to set up.

For how long have I used the solution?

I have been using Apache Spark for about two and a half years now.

What was my experience with deployment of the solution?

Clustering may be a bit complex to set up, but it depends on the experience that the person involved has.

What do I think about the stability of the solution?

I find Apache Spark to be fine and stable.

What do I think about the scalability of the solution?

The scalability of Apache Spark depends on the number of machines being used. By adjusting the worker and driver nodes, scaling can be leveraged.

How are customer service and support?

I haven't tried Apache Spark's official support. I mostly use ChatGPT for assistance.

How would you rate customer service and support?

Neutral

Which solution did I use previously and why did I switch?

Before Spark, I used solutions like Storm and Flume. In AWS, it is Kinesis. Both Kinesis and Spark have their own ways of managing data injection and compute.

How was the initial setup?

In the public cloud, it comes with built-in services, but for on-premises, I have to spin up my own cluster using my ecosystem.

What about the implementation team?

A single person can handle installation if they are capable enough.

What was our ROI?

Timing depends on the cluster being used, such as how many compute nodes and the kind of data there is. So, it's not straightforward to specify the percentage of time and money saved.

What's my experience with pricing, setup cost, and licensing?

Apache Spark is open-source, so it doesn't incur any charges.

Which other solutions did I evaluate?

I used Storm, Flume, and AWS Kinesis.

What other advice do I have?

I rate my overall experience with Apache Spark as eight out of ten. I suggest leveraging AI capabilities to enhance performance or check for anomalies.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Other

Disclosure: I am a real user, and this review is based on my own experience and opinions.

Last updated: Apr 24, 2025