Try our new research platform with insights from 80,000+ expert users
Apache Spark Logo

Apache Spark Reviews

Vendor: Apache
4.2 out of 5
Badge Ranked 1
1,180 followers
Start review

What is Apache Spark?

Featured Apache Spark reviews

Apache Spark mindshare

As of March 2025, the mindshare of Apache Spark in the Hadoop category stands at 17.8%, down from 21.2% compared to the previous year, according to calculations based on PeerSpot user engagement data.
Hadoop

PeerAnalyst reports based on Apache Spark reviews

TypeTitleDate
CategoryHadoopMar 7, 2025Download
ProductReviews, tips, and advice from real usersMar 7, 2025Download
ComparisonApache Spark vs Cloudera Distribution for HadoopMar 7, 2025Download
ComparisonApache Spark vs Amazon EMRMar 7, 2025Download
ComparisonApache Spark vs HPE Ezmeral Data FabricMar 7, 2025Download
Suggested products
TitleRatingMindshareRecommending
Amazon EMR3.913.2%85%22 interviewsAdd to research
Cloudera Distribution for Hadoop4.025.6%92%50 interviewsAdd to research
 
 
Key learnings from peers

Valuable Features

Room for Improvement

Pricing

Review data by company size

By reviewers
By visitors reading reviews

Top industries

By visitors reading reviews
Financial Services Firm
27%
Computer Software Company
13%
Manufacturing Company
7%
Comms Service Provider
5%
University
5%
Retailer
5%
Government
5%
Educational Organization
4%
Insurance Company
4%
Healthcare Company
3%
Real Estate/Law Firm
3%
Energy/Utilities Company
2%
Media Company
2%
Hospitality Company
2%
Construction Company
2%
Non Profit
1%
Transportation Company
1%
Recreational Facilities/Services Company
1%
Wholesaler/Distributor
1%
Pharma/Biotech Company
1%
Legal Firm
1%
 

Apache Spark reviews

Sort by:
SS
Sr Manager at a transportation company with 10,001+ employees
Verified user of Apache Spark
Dec 11, 2023
Offers real-time and near-real-time data processing

Pros

"We use it for ETL purposes as well as for implementing the full transformation pipelines."

Cons

"Apart from the restrictions that come with its in-memory implementation. It has been improved significantly up to version 3.0, which is currently in use. "
Ilya Afanasyev - PeerSpot user
Senior Software Development Engineer at Yahoo!
Verified user of Apache Spark
Aug 22, 2022
Product version discussed: 3.2.0
Reliable, able to expand, and handle large amounts of data well

Pros

"There's a lot of functionality. "

Cons

"I know there is always discussion about which language to write applications in and some people do love Scala. However, I don't like it."
Find out what your peers are saying about Apache Spark. Updated March 2025
841,099 professionals have used our research since 2012.
Dunstan Matekenya - PeerSpot user
Data Scientist at a financial services firm with 10,001+ employees
Verified user of Apache Spark
Jul 30, 2024
Open-source solution for data processing with portability

Pros

"Apache Spark is known for its ease of use. Compared to other available data processing frameworks, it is user-friendly."

Cons

"Apache Spark lacks geospatial data."
SurjitChoudhury - PeerSpot user
Data engineer at Cocos pt
Verified user of Apache Spark
Mar 16, 2024
Offers batch processing of data and in-memory processing in Spark greatly enhances performance

Pros

"Now, when we're tackling sentiment analysis using NLP technologies, we deal with unstructured data—customer chats, feedback on promotions or demos, and even media like images, audio, and video files. For processing such data, we rely on PySpark. Beneath the surface, Spark functions as a compute engine with in-memory processing capabilities, enhancing performance through features like broadcasting and caching. It's become a crucial tool, widely adopted by 90% of companies for a decade or more."

Cons

"There could be enhancements in optimization techniques, as there are some limitations in this area that could be addressed to further refine Spark's performance."
Bharghava Raghavendra Beesa - PeerSpot user
Senior Developer at Infosys
Verified user of Apache Spark
Jan 22, 2025
Faster data transformations achieved but scheduling dependencies require external solutions

Pros

"Spark is used for transformations from large volumes of data, and it is usefully distributed."

Cons

"The Spark solution could improve in scheduling tasks and managing dependencies. "
VM
Cloud solution architect at 0
Verified user of Apache Spark
Mar 10, 2024
Offers seamless integration with Azure services and on-premises servers

Pros

"The solution is scalable. "

Cons

"The setup I worked on was really complex."
Anshuman Kishore - PeerSpot user
Director Product Development at Mycom Osi
Verified user of Apache Spark
Apr 1, 2024
Available for free and can be deployed easily

Pros

"The product's deployment phase is easy."

Cons

"At times during the deployment process, the tool goes down, making it look less robust. To take care of the issues in the deployment process, users need to do manual interventions occasionally."
Miodrag Milojevic - PeerSpot user
Senior Data Archirect at Yettel
Verified user of Apache Spark
Aug 18, 2023
Parallel computing helped create data lakes with near real-time loading

Pros

"It's easy to prepare parallelism in Spark, run the solution with specific parameters, and get good performance. "

Cons

"If you have a Spark session in the background, sometimes it's very hard to kill these sessions because of D allocation."