Try our new research platform with insights from 80,000+ expert users

AWS Lambda vs Apache Spark comparison

 

Comparison Buyer's Guide

Executive Summary
 

Categories and Ranking

Apache Spark
Ranking in Compute Service
4th
Average Rating
8.4
Reviews Sentiment
7.7
Number of Reviews
64
Ranking in other categories
Hadoop (1st), Java Frameworks (2nd)
AWS Lambda
Ranking in Compute Service
1st
Average Rating
8.4
Reviews Sentiment
7.5
Number of Reviews
79
Ranking in other categories
No ranking in other categories
 

Mindshare comparison

As of December 2024, in the Compute Service category, the mindshare of Apache Spark is 11.1%, up from 7.8% compared to the previous year. The mindshare of AWS Lambda is 20.3%, down from 27.7% compared to the previous year. It is calculated based on PeerSpot user engagement data.
Compute Service
 

Featured Reviews

SurjitChoudhury - PeerSpot reviewer
Offers batch processing of data and in-memory processing in Spark greatly enhances performance
Spark supports real-time data processing through Spark Streaming. It allows for batch processing of data. If you have immediate data, like chat information, that needs to be processed in real-time, Spark Streaming is used. For data that can be evaluated later, batch processing with Apache Spark is suitable. Mostly, batch processing is utilized in our organization, but for streaming data processing, tools like Kafka are often integrated. In-memory processing in Spark greatly enhances performance, making it a hundred times faster than the previous MapReduce methods. This improvement is achieved through optimization techniques like caching, broadcasting, and partitioning, which help in optimizing queries for faster processing.
Wai L Lin O - PeerSpot reviewer
A serverless solution with easy integration features
We use AWS Lambda because it provides a solution for our needs without requiring us to manage our infrastructure. With the tool, we only pay for the resources we use. Additionally, it is straightforward to implement and integrates with other services like API Gateway. The tool's serverless nature has had the most significant impact on our workflow. I find it particularly attractive because it eliminates the need for managing servers. In my previous experience, managing upgrades and updates was quite challenging. The solution's integration process with other AWS services was relatively easy. We primarily use AWS services such as EventBridge for scheduling processes and log management.

Quotes from Members

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Pros

"The product’s most valuable features are lazy evaluation and workload distribution."
"Provides a lot of good documentation compared to other solutions."
"The processing time is very much improved over the data warehouse solution that we were using."
"Apache Spark is known for its ease of use. Compared to other available data processing frameworks, it is user-friendly."
"Spark helps us reduce startup time for our customers and gives a very high ROI in the medium term."
"The product’s most valuable feature is the SQL tool. It enables us to create a database and publish it."
"The product is useful for analytics."
"ETL and streaming capabilities."
"The tool scales automatically based on the number of incoming requests."
"It is my preferred product, as it provides me with source code within the solution."
"The ease and speed of developing the services using any available language is the most valuable feature."
"The solution integrates well with API gateways and S3 events via its AWS ecosystem."
"It is easy to use."
"I can use the solution to configure and set up all the requirements for testing the application and test code."
"The most valuable features are event-based triggers. They're really good for a reactive style when you want things to happen as soon as something else happens."
"AWS Lambda's best features are log analysis and event triggering and actioning."
 

Cons

"I know there is always discussion about which language to write applications in and some people do love Scala. However, I don't like it."
"In data analysis, you need to take real-time data from different data sources. You need to process this in a subsecond, do the transformation in a subsecond, and all that."
"When using Spark, users may need to write their own parallelization logic, which requires additional effort and expertise."
"Stream processing needs to be developed more in Spark. I have used Flink previously. Flink is better than Spark at stream processing."
"When you are working with large, complex tasks, the garbage collection process is slow and affects performance."
"Its UI can be better. Maintaining the history server is a little cumbersome, and it should be improved. I had issues while looking at the historical tags, which sometimes created problems. You have to separately create a history server and run it. Such things can be made easier. Instead of separately installing the history server, it can be made a part of the whole setup so that whenever you set it up, it becomes available."
"Apache Spark can improve the use case scenarios from the website. There is not any information on how you can use the solution across the relational databases toward multiple databases."
"There could be enhancements in optimization techniques, as there are some limitations in this area that could be addressed to further refine Spark's performance."
"Its price should be improved. Its pricing is on the higher side. I am not sure if it currently supports the Go language. If it doesn't support the Go language, they can introduce it."
"The automation with other Amazon products could be better."
"The user-friendliness of the solution could be improved."
"The security needs to be improved."
"It could be cheaper."
"We can write anything as code, but the solution will not give proper error information."
"There's room for improvement in the solution's warm start, which refers to the minimum time it takes to start up a Lambda function if you haven't been running it."
"I would like the layers to have a bigger volume. I would like to be able to add more. I don't want to be limited by the layer."
 

Pricing and Cost Advice

"The tool is an open-source product. If you're using the open-source Apache Spark, no fees are involved at any time. Charges only come into play when using it with other services like Databricks."
"It is quite expensive. In fact, it accounts for almost 50% of the cost of our entire project."
"It is an open-source platform. We do not pay for its subscription."
"Apache Spark is not too cheap. You have to pay for hardware and Cloudera licenses. Of course, there is a solution with open source without Cloudera."
"Apache Spark is open-source. You have to pay only when you use any bundled product, such as Cloudera."
"I did not pay anything when using the tool on cloud services, but I had to pay on the compute side. The tool is not expensive compared with the benefits it offers. I rate the price as an eight out of ten."
"Apache Spark is an open-source solution, and there is no cost involved in deploying the solution on-premises."
"We are using the free version of the solution."
"The pricing is on-demand and based on runs or times that are billed out monthly."
"The cost is based on runtime."
"The price of the solution is reasonable and it is a pay-per-use model. It is very good for cost optimization."
"We only need to pay for the compute time our code consumes."
"We don't need to pay for licensing to use Lambda."
"For licensing, we pay a yearly subscription."
"It costs maybe less than $10 per month in my use case."
"AWS Lambda cost is pretty decent."
report
Use our free recommendation engine to learn which Compute Service solutions are best for your needs.
824,053 professionals have used our research since 2012.
 

Top Industries

By visitors reading reviews
Financial Services Firm
27%
Computer Software Company
13%
Manufacturing Company
8%
Retailer
5%
Educational Organization
62%
Financial Services Firm
10%
Computer Software Company
6%
Manufacturing Company
3%
 

Company Size

By reviewers
Large Enterprise
Midsize Enterprise
Small Business
 

Questions from the Community

What do you like most about Apache Spark?
We use Spark to process data from different data sources.
What is your experience regarding pricing and costs for Apache Spark?
Compared to other solutions like Doc DB, Spark is more costly due to the need for extensive infrastructure. It requires significant investment in infrastructure, which can be expensive. While cloud...
What needs improvement with Apache Spark?
The main concern is the overhead of Java when distributed processing is not necessary. In such cases, operations can often be done on one node, making Spark's distributed mode unnecessary. Conseque...
Which is better, AWS Lambda or Batch?
AWS Lambda is a serverless solution. It doesn’t require any infrastructure, which allows for cost savings. There is no setup process to deal with, as the entire solution is in the cloud. If you use...
What do you like most about AWS Lambda?
The tool scales automatically based on the number of incoming requests.
What is your experience regarding pricing and costs for AWS Lambda?
AWS Lambda offers a highly favorable pricing model, especially for smaller applications or low-traffic workloads. The first one million requests per month are free, which provides significant cost ...
 

Comparisons

 

Overview

 

Sample Customers

NASA JPL, UC Berkeley AMPLab, Amazon, eBay, Yahoo!, UC Santa Cruz, TripAdvisor, Taboola, Agile Lab, Art.com, Baidu, Alibaba Taobao, EURECOM, Hitachi Solutions
Netflix
Find out what your peers are saying about AWS Lambda vs. Apache Spark and other solutions. Updated: December 2024.
824,053 professionals have used our research since 2012.