Apache Spark vs QueryIO comparison

Apache and QueryIO are both solutions in the Hadoop category. Apache is ranked #1 with an average rating of 8.6, while QueryIO is ranked #15. Apache holds a 17.5% mindshare in H, compared to QueryIO’s 0.5% mindshare. Additionally, 90% of Apache users are willing to recommend the solution, compared to 100% of QueryIO users who would recommend it.

Apache Spark

Read 65 Apache Spark reviews

1,404 Views
1,090 Comparison Views

90% willing to recommend

QueryIO

Read 1 QueryIO review

61 Views
32 Comparison Views

100% willing to recommend

Apache Spark

QueryIO

Comparison Buyer's Guide

Download the report

Executive Summary

We performed a comparison between Apache Spark and QueryIO based on real PeerSpot user reviews.

Find out what your peers are saying about Apache, Cloudera, Amazon Web Services (AWS) and others in Hadoop.

To learn more, read our detailed Hadoop Report (Updated: March 2025).

Buyer's Guide

Hadoop

March 2025

Download the complete report

Helped 848,270 peers since 2012

Review summaries and opinions

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:

Categories and Ranking

Apache Spark

Ranking in Hadoop

1st

Average Rating

8.4

Reviews Sentiment

7.7

Number of Reviews

Ranking in other categories

Compute Service (4th), Java Frameworks (2nd)

QueryIO

Ranking in Hadoop

15th

Average Rating

8.0

Number of Reviews

Ranking in other categories

No ranking in other categories

Mindshare comparison

As of April 2025, in the Hadoop category, the mindshare of Apache Spark is 17.5%, down from 21.4% compared to the previous year. The mindshare of QueryIO is 0.5%, down from 0.6% compared to the previous year. It is calculated based on PeerSpot user engagement data.

Hadoop

Featured Reviews

Ilya Afanasyev

Senior Software Development Engineer at Yahoo!

Reliable, able to expand, and handle large amounts of data well

We use batch processing. It works well with our formats and file versions. There's a lot of functionality. In our pipeline each hour, we make a copy of data from MongoDB, of the changes from MongoDB to some specific file. Each time pipeline copied all of the data, it would do it each time without changes to all of the tables. Tables have a lot of data, and in the last MongoDB version, there is a possibility to read only changed data. This reduced the cost and configuration of the cluster, and we saved about $150,000. The solution is scalable. It's a stable product.

Read full review

Marco Reyes

Manager of Process & Systems / Solutions Architect / BI Developer at HENKEL FRANCE

Stable with good connectivity and good integration capabilities

Data cleansing is not intuitive and user-friendly. When things have errors, you have to hunt them down as opposed to the solution simply showing you intuitively where to find it. I would recommend that they look at that Tableau Prep tool and see how it is pieced together. That's a great data cleansing tool. If Microsoft has something like that, then we wouldn't even have to look at some of the other options. There needs to be some simplification of the user interface. Right now it's too complicated. There isn't a way to put controls on the solution, so anyone can use any part of it, and sometimes novices will go and try to create things, but not know enough about what is official and what is published. It would be ideal if we could segment off certain sections so that not everyone had access to the whole solution. I'd like to see something more of a mapping tool so that you could see how the reports are connected, similar to Tableau Prep and Naim. That would make for a pretty useful diagnostics check. People would be better able to understand the linkage between your datasets. It would be nice if the solution offered some templates. It would make it even more plug and play, and give people a good jumping-off point. After that, they could explore other bells and whistles as they get further into understanding the solution. The solution should work in some virtualization. It would be a good added feature. If this product had those things then I wouldn't need to use other products.

Read full review

Quotes from Members

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:

Pros

"I found the solution stable. We haven't had any problems with it."

"Provides a lot of good documentation compared to other solutions."

"Apache Spark can do large volume interactive data analysis."

"The tool's most valuable feature is its speed and efficiency. It's much faster than other tools and excels in parallel data processing. Unlike tools like Python or JavaScript, which may struggle with parallel processing, it allows us to handle large volumes of data with more power easily."

"Spark helps us reduce startup time for our customers and gives a very high ROI in the medium term."

"ETL and streaming capabilities."

"The most valuable feature of Apache Spark is its flexibility."

"The most valuable feature of Apache Spark is its memory processing because it processes data over RAM rather than disk, which is much more efficient and fast."

More Apache Spark pros

"Anyone who has even a little bit of knowledge of the solution can begin to create things. You don't have to be technical to use the solution."

Cons

"Apache Spark lacks geospatial data."

"Dynamic DataFrame options are not yet available."

"We've had problems using a Python process to try to access something in a large volume of data. It crashes if somebody gives me the wrong code because it cannot handle a large volume of data."

"Stability in terms of API (things were difficult, when transitioning from RDD to DataFrames, then to DataSet)."

"From my perspective, the only thing that needs improvement is the interface, as it was not easily understandable."

"There could be enhancements in optimization techniques, as there are some limitations in this area that could be addressed to further refine Spark's performance."

"It needs a new interface and a better way to get some data. In terms of writing our scripts, some processes could be faster."

"We are building our own queries on Spark, and it can be improved in terms of query handling."

More Apache Spark cons

"There needs to be some simplification of the user interface."

Pricing and Cost Advice

"Since we are using the Apache Spark version, not the data bricks version, it is an Apache license version, the support and resolution of the bug are actually late or delayed. The Apache license is free."

"It is an open-source platform. We do not pay for its subscription."

"I did not pay anything when using the tool on cloud services, but I had to pay on the compute side. The tool is not expensive compared with the benefits it offers. I rate the price as an eight out of ten."

"It is an open-source solution, it is free of charge."

"On the cloud model can be expensive as it requires substantial resources for implementation, covering on-premises hardware, memory, and licensing."

"Apache Spark is open-source. You have to pay only when you use any bundled product, such as Cloudera."

"Considering the product version used in my company, I feel that the tool is not costly since the product is available for free."

"Apache Spark is not too cheap. You have to pay for hardware and Cloudera licenses. Of course, there is a solution with open source without Cloudera."

More Apache Spark pricing and cost advice

Information not available

See which vendors are best for you

Use our free recommendation engine to learn which Hadoop solutions are best for your needs.

See recommendations

848,270 professionals have used our research since 2012.

Top Industries

By visitors reading reviews

Financial Services Firm

27%

Computer Software Company

13%

Manufacturing Company

Comms Service Provider

No data available

Company Size

By reviewers

Large Enterprise

Midsize Enterprise

Small Business

No data available

Questions from the Community

What do you like most about Apache Spark?

We use Spark to process data from different data sources.

See all answers

What is your experience regarding pricing and costs for Apache Spark?

Compared to other solutions like Doc DB, Spark is more costly due to the need for extensive infrastructure. It requires significant investment in infrastructure, which can be expensive. While cloud...

See all answers

What needs improvement with Apache Spark?

The Spark solution could improve in scheduling tasks and managing dependencies. Spark alone cannot handle sequential tasks, requiring environments like Airflow scheduler or scripts. For instance, o...

See all answers

Ask a question

Earn 20 points

Comparisons

Spring Boot vs Apache Spark

Compared 27% of the time

SAP HANA vs Apache Spark

Compared 12% of the time

AWS Batch vs Apache Spark

Compared 11% of the time

Cloudera Distribution for Hadoop vs Apache Spark

Compared 7% of the time

Spark SQL vs Apache Spark

Compared 7% of the time

More Apache Spark Competitors

No data available

Product Reports

Buyer's Guide

Apache Spark

April 2025

Download Apache Spark product report

Buyer's Guide

Hadoop

March 2025

Download QueryIO product report

Overview

Spark provides programmers with an application programming interface centered on a data structure called the resilient distributed dataset (RDD), a read-only multiset of data items distributed over a cluster of machines, that is maintained in a fault-tolerant way. It was developed in response to limitations in the MapReduce cluster computing paradigm, which forces a particular linear dataflowstructure on distributed programs: MapReduce programs read input data from disk, map a function across the data, reduce the results of the map, and store reduction results on disk. Spark's RDDs function as a working set for distributed programs that offers a (deliberately) restricted form of distributed shared memory

Apache

QueryIO is a Hadoop-based SQL and Big Data Analytics solution, used to store, structure, analyze and visualize vast amounts of structured and unstructured Big Data. It is especially well suited to enable users to process unstructured Big Data, give it a structure and support querying and analysis of this Big Data using standard SQL syntax. QueryIO enables you to leverage the vast and mature infrastructure built around SQL and relational databases and utilize it for your Big Data Analytics needs.

QueryIO

Sample Customers

NASA JPL, UC Berkeley AMPLab, Amazon, eBay, Yahoo!, UC Santa Cruz, TripAdvisor, Taboola, Agile Lab, Art.com, Baidu, Alibaba Taobao, EURECOM, Hitachi Solutions

Information Not Available

Find out what your peers are saying about Apache, Cloudera, Amazon Web Services (AWS) and others in Hadoop. Updated: March 2025.

DOWNLOAD NOW

848,270 professionals have used our research since 2012.

See our list of best Hadoop vendors.

We monitor all Hadoop reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.