Apache Spark vs Cask comparison

Apache and Cask are both solutions in the Hadoop category. Apache is ranked #1 with an average rating of 8.6, while Cask is ranked #14. Apache holds a 17.5% mindshare in H, compared to Cask’s 0.8% mindshare. Additionally, 90% of Apache users are willing to recommend the solution.

Apache Spark

Read 65 Apache Spark reviews

1,404 Views
1,090 Comparison Views

90% willing to recommend

Cask

53 Views
40 Comparison Views

Apache Spark

Cask

Comparison Buyer's Guide

Download the report

Executive Summary

We performed a comparison between Apache Spark and Cask based on real PeerSpot user reviews.

Find out what your peers are saying about Apache, Cloudera, Amazon Web Services (AWS) and others in Hadoop.

To learn more, read our detailed Hadoop Report (Updated: March 2025).

Buyer's Guide

Hadoop

March 2025

Download the complete report

Helped 847,862 peers since 2012

Review summaries and opinions

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:

Categories and Ranking

Apache Spark

Ranking in Hadoop

1st

Average Rating

8.4

Reviews Sentiment

7.7

Number of Reviews

Ranking in other categories

Compute Service (4th), Java Frameworks (2nd)

Cask

Ranking in Hadoop

14th

Average Rating

0.0

Number of Reviews

Ranking in other categories

No ranking in other categories

Mindshare comparison

As of April 2025, in the Hadoop category, the mindshare of Apache Spark is 17.5%, down from 21.4% compared to the previous year. The mindshare of Cask is 0.8%, up from 0.6% compared to the previous year. It is calculated based on PeerSpot user engagement data.

Hadoop

Featured Reviews

Ilya Afanasyev

Senior Software Development Engineer at Yahoo!

Reliable, able to expand, and handle large amounts of data well

We use batch processing. It works well with our formats and file versions. There's a lot of functionality. In our pipeline each hour, we make a copy of data from MongoDB, of the changes from MongoDB to some specific file. Each time pipeline copied all of the data, it would do it each time without changes to all of the tables. Tables have a lot of data, and in the last MongoDB version, there is a possibility to read only changed data. This reduced the cost and configuration of the cluster, and we saved about $150,000. The solution is scalable. It's a stable product.

Read full review

Use Cask?

Share your opinion

See which vendors are best for you

Use our free recommendation engine to learn which Hadoop solutions are best for your needs.

See recommendations

847,862 professionals have used our research since 2012.

Top Industries

By visitors reading reviews

Financial Services Firm

27%

Computer Software Company

13%

Manufacturing Company

Comms Service Provider

No data available

Company Size

By reviewers

Large Enterprise

Midsize Enterprise

Small Business

No data available

Questions from the Community

What do you like most about Apache Spark?

We use Spark to process data from different data sources.

See all answers

What is your experience regarding pricing and costs for Apache Spark?

Compared to other solutions like Doc DB, Spark is more costly due to the need for extensive infrastructure. It requires significant investment in infrastructure, which can be expensive. While cloud...

See all answers

What needs improvement with Apache Spark?

The Spark solution could improve in scheduling tasks and managing dependencies. Spark alone cannot handle sequential tasks, requiring environments like Airflow scheduler or scripts. For instance, o...

See all answers

Ask a question

Earn 20 points

Comparisons

Spring Boot vs Apache Spark

Compared 26% of the time

SAP HANA vs Apache Spark

Compared 12% of the time

AWS Batch vs Apache Spark

Compared 12% of the time

Cloudera Distribution for Hadoop vs Apache Spark

Compared 7% of the time

Spark SQL vs Apache Spark

Compared 7% of the time

More Apache Spark Competitors

No data available

Product Reports

Buyer's Guide

Apache Spark

April 2025

Download Apache Spark product report

Buyer's Guide

Hadoop

March 2025

Download Cask product report

Overview

Spark provides programmers with an application programming interface centered on a data structure called the resilient distributed dataset (RDD), a read-only multiset of data items distributed over a cluster of machines, that is maintained in a fault-tolerant way. It was developed in response to limitations in the MapReduce cluster computing paradigm, which forces a particular linear dataflowstructure on distributed programs: MapReduce programs read input data from disk, map a function across the data, reduce the results of the map, and store reduction results on disk. Spark's RDDs function as a working set for distributed programs that offers a (deliberately) restricted form of distributed shared memory

Apache

Cask Data Application Platform (CDAP) is the first Unified Platform for Big Data. It provides standardization and deep integrations with diverse Hadoop technologies allowing companies to focus on application logic and insights, rather than infrastructure and integration. The platform is 100% open-source, highly extensible, and delivers enterprise-class features to help accelerate time to build, deploy, and manage data-centric applications & data lakes on Hadoop and Spark.

There are 3 extensions packaged with CDAP: Cask Hydrator, Cask Wrangler and Cask Tracker. CDAP Extensions are self-service, purpose-built applications on CDAP designed to solve common and critical big data challenges. Cask Hydrator for data pipelines, Cask Wrangler for data wrangling and Cask Tracker for data discovery and metadata.

CDAP removes barriers to innovation as an extensible and future-proof platform that provides consistency across environments and easily integrates with existing MDM, BI, and security solutions.

Cask

Sample Customers

NASA JPL, UC Berkeley AMPLab, Amazon, eBay, Yahoo!, UC Santa Cruz, TripAdvisor, Taboola, Agile Lab, Art.com, Baidu, Alibaba Taobao, EURECOM, Hitachi Solutions

AT&T, Salesforce, Cloudera, Hortonworks, Lotame, MAPR, Pet360, Ignition, Safeguard, Cloudwick, Kogentix

Find out what your peers are saying about Apache, Cloudera, Amazon Web Services (AWS) and others in Hadoop. Updated: March 2025.

DOWNLOAD NOW

847,862 professionals have used our research since 2012.

See our list of best Hadoop vendors.

We monitor all Hadoop reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.