Try our new research platform with insights from 80,000+ expert users

AWS Lambda vs Apache Spark comparison

 

Comparison Buyer's Guide

Executive Summary

Review summaries and opinions

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Categories and Ranking

Apache Spark
Ranking in Compute Service
4th
Average Rating
8.4
Reviews Sentiment
7.7
Number of Reviews
65
Ranking in other categories
Hadoop (1st), Java Frameworks (2nd)
AWS Lambda
Ranking in Compute Service
1st
Average Rating
8.4
Reviews Sentiment
7.5
Number of Reviews
84
Ranking in other categories
No ranking in other categories
 

Mindshare comparison

As of April 2025, in the Compute Service category, the mindshare of Apache Spark is 11.2%, up from 9.7% compared to the previous year. The mindshare of AWS Lambda is 21.0%, down from 23.2% compared to the previous year. It is calculated based on PeerSpot user engagement data.
Compute Service
 

Featured Reviews

Ilya Afanasyev - PeerSpot reviewer
Reliable, able to expand, and handle large amounts of data well
We use batch processing. It works well with our formats and file versions. There's a lot of functionality. In our pipeline each hour, we make a copy of data from MongoDB, of the changes from MongoDB to some specific file. Each time pipeline copied all of the data, it would do it each time without changes to all of the tables. Tables have a lot of data, and in the last MongoDB version, there is a possibility to read only changed data. This reduced the cost and configuration of the cluster, and we saved about $150,000. The solution is scalable. It's a stable product.
Wai L Lin O - PeerSpot reviewer
A serverless solution with easy integration features
We use AWS Lambda because it provides a solution for our needs without requiring us to manage our infrastructure. With the tool, we only pay for the resources we use. Additionally, it is straightforward to implement and integrates with other services like API Gateway. The tool's serverless nature has had the most significant impact on our workflow. I find it particularly attractive because it eliminates the need for managing servers. In my previous experience, managing upgrades and updates was quite challenging. The solution's integration process with other AWS services was relatively easy. We primarily use AWS services such as EventBridge for scheduling processes and log management.

Quotes from Members

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Pros

"It is useful for handling large amounts of data. It is very useful for scientific purposes."
"I like that it can handle multiple tasks parallelly. I also like the automation feature. JavaScript also helps with the parallel streaming of the library."
"I found the solution stable. We haven't had any problems with it."
"The product's initial setup phase was easy."
"There's a lot of functionality."
"Its scalability and speed are very valuable. You can scale it a lot. It is a great technology for big data. It is definitely better than a lot of earlier warehouse or pipeline solutions, such as Informatica. Spark SQL is very compliant with normal SQL that we have been using over the years. This makes it easy to code in Spark. It is just like using normal SQL. You can use the APIs of Spark or you can directly write SQL code and run it. This is something that I feel is useful in Spark."
"It provides a scalable machine learning library."
"The product’s most valuable feature is the SQL tool. It enables us to create a database and publish it."
"We use AWS Lambda because it provides a solution for our needs without requiring us to manage our infrastructure. With the tool, we only pay for the resources we use. Additionally, it is straightforward to implement and integrates with other services like API Gateway."
"It enables the launch of thousands of instances simultaneously,"
"Lambda is the preferred compute option because of on-demand cost. We don't have to provision any hardware beforehand. We don't have to provision the capacity required for the services because it is serverless."
"We moved our users into the Amazon Cognito pool, so it helps us to standardize our security practices, approaches, etc. We can customize Lambda for authentication to integrate it with API Gateway and other services."
"The initial setup is pretty easy."
"Lambda being serverless is a great feature that is appropriate for our use cases."
"This product is easy to use."
"We are building a Twitter-like application in the boot camp. I have used Lamda for the integration of the post-confirmation page in the application. This will help you get your one-time password via mail. You can log in with the help of a post-confirmation page. We didn’t want to setup an instance specifically for confirmation. We used the Lambda function so that it goes back to sleep after pushing up."
 

Cons

"From my perspective, the only thing that needs improvement is the interface, as it was not easily understandable."
"There were some problems related to the product's compatibility with a few Python libraries."
"Apache Spark's GUI and scalability could be improved."
"When you first start using this solution, it is common to run into memory errors when you are dealing with large amounts of data."
"Apache Spark could potentially improve in terms of user-friendliness, particularly for individuals with a SQL background. While it's suitable for those with programming knowledge, making it more accessible to those without extensive programming skills could be beneficial."
"The setup I worked on was really complex."
"More ML based algorithms should be added to it, to make it algorithmic-rich for developers."
"We are building our own queries on Spark, and it can be improved in terms of query handling."
"AWS Lambda has a maximum execution timeout of 15 minutes, which is unsuitable for long-running tasks."
"I have seen some drawbacks with certain integrations."
"The metrics and reporting for this solution could be improved."
"A very minor improvement would be to simplify the instructions on setting a trigger, as I had to read through them multiple times at the start."
"I want to see support for longer applications. I need the 15-minute time-out window to improve."
"Another challenge I've noticed is that there is a limit to the environment variables such as the 4 KB limit. Although, the advice is to use parameters or other things to store the details when the limit has exceeded the data, this adds additional intensity to the application. If the size limits for environment variables can be revealed, it would be helpful. Even if we have to pay for it, at least we would know that we are not dealing with latency. So, I would like to see the size of the environment variables increased."
"The deployment process is a bit complex, so it could be simplified to make it easier for beginners to deploy."
"We'd love to see more integration potential in the future."
 

Pricing and Cost Advice

"Apache Spark is not too cheap. You have to pay for hardware and Cloudera licenses. Of course, there is a solution with open source without Cloudera."
"On the cloud model can be expensive as it requires substantial resources for implementation, covering on-premises hardware, memory, and licensing."
"They provide an open-source license for the on-premise version."
"The tool is an open-source product. If you're using the open-source Apache Spark, no fees are involved at any time. Charges only come into play when using it with other services like Databricks."
"Since we are using the Apache Spark version, not the data bricks version, it is an Apache license version, the support and resolution of the bug are actually late or delayed. The Apache license is free."
"Apache Spark is an expensive solution."
"Spark is an open-source solution, so there are no licensing costs."
"Considering the product version used in my company, I feel that the tool is not costly since the product is available for free."
"AWS Lambda is cheap."
"You're not paying for a server if you're not using it, which is another reason I like it. So, you're not paying if you're not using it. It scales, and you're charged based on usage. It all depends on the use case. Some can be extremely inexpensive if you have very low volume transaction rates. That way, you don't have to fire up and absorb the cost of the servers just sitting there waiting for a transaction to come through. You're only paying when you use it. So, depending upon the use model, Lambda could be highly efficient relative to an EC2 solution. You don't have to have things reallocated."
"I would rate the tool’s pricing a nine out of ten. The solution’s pricing works on a pay-as-you-go basis."
"The solution is free of cost for the first year, and after that, it becomes expensive."
"This is a product that is pay-per-use, as opposed to a licensing fee."
"AWS Lambda is a very inexpensive solution. They charge for the number of times we run it. If you run AWS Lambda for one time, they charge around 50 cents or 25 cents for the use. I don't know the exact price, but it's less than a dollar."
"The cost is based on runtime."
"Price-wise, AWS Lambda is very cheap. It's not free, but it's not that expensive."
report
Use our free recommendation engine to learn which Compute Service solutions are best for your needs.
848,716 professionals have used our research since 2012.
 

Top Industries

By visitors reading reviews
Financial Services Firm
27%
Computer Software Company
13%
Manufacturing Company
8%
Comms Service Provider
6%
Educational Organization
67%
Financial Services Firm
8%
Computer Software Company
5%
Manufacturing Company
3%
 

Company Size

By reviewers
Large Enterprise
Midsize Enterprise
Small Business
 

Questions from the Community

What do you like most about Apache Spark?
We use Spark to process data from different data sources.
What is your experience regarding pricing and costs for Apache Spark?
Compared to other solutions like Doc DB, Spark is more costly due to the need for extensive infrastructure. It requires significant investment in infrastructure, which can be expensive. While cloud...
What needs improvement with Apache Spark?
The Spark solution could improve in scheduling tasks and managing dependencies. Spark alone cannot handle sequential tasks, requiring environments like Airflow scheduler or scripts. For instance, o...
Which is better, AWS Lambda or Batch?
AWS Lambda is a serverless solution. It doesn’t require any infrastructure, which allows for cost savings. There is no setup process to deal with, as the entire solution is in the cloud. If you use...
What do you like most about AWS Lambda?
The tool scales automatically based on the number of incoming requests.
What is your experience regarding pricing and costs for AWS Lambda?
AWS Lambda is cheaper compared to running an instance continuously. You only pay for what you use, making it cost-effective.
 

Comparisons

 

Overview

 

Sample Customers

NASA JPL, UC Berkeley AMPLab, Amazon, eBay, Yahoo!, UC Santa Cruz, TripAdvisor, Taboola, Agile Lab, Art.com, Baidu, Alibaba Taobao, EURECOM, Hitachi Solutions
Netflix
Find out what your peers are saying about AWS Lambda vs. Apache Spark and other solutions. Updated: April 2025.
848,716 professionals have used our research since 2012.