Spark SQL Reviews

Name: Spark SQL
Brand: Apache
Rating: 3.9 (14 reviews)

3.9 out of 5

14 reviews
85% willing to recommend

174 followers

What is Spark SQL?

Spark SQL is a Spark module for structured data processing. Unlike the basic Spark RDD API, the interfaces provided by Spark SQL provide Spark with more information about the structure of both the data and the computation being performed. There are several ways to interact with Spark SQL including SQL and the Dataset API. When computing a result the same execution engine is used, independent of which API/language you are using to express the computation. This unification means that developers can easily switch back and forth between different APIs based on which provides the most natural way to express a given transformation.

Get the Spark SQL Buyer's Guide and find out what your peers are saying about Spark SQL, Apache Spark, Amazon EMR and more!

Spark SQL is the #5 ranked solution in top Hadoop solutions. PeerSpot users give Spark SQL an average rating of 7.8 out of 10. Spark SQL is most commonly compared to Apache Spark: Spark SQL vs Apache Spark. Spark SQL is popular among the large enterprise segment, accounting for 74% of users researching this solution on PeerSpot. The top industry researching this solution are professionals from a financial services firm, accounting for 22% of all views.

Helped 847,646 peers since 2012

Featured Spark SQL reviews

Sahil Taneja

Principal Consultant/Manager at Tenzing

Spark SQL can improve the documentation they have provided. It can be a bit unclear at times. They could improve the documentation a bit more so that we can understand it more easily. Moreover, they could improve SparkUI to have more advanced versions of the performance and the queries and all.

Read full review

SurjitChoudhury

Data engineer at Cocos pt

My experience with the initial setup of Spark SQL was relatively smooth. Understanding the system wasn't overly difficult because the data was structured in databases, and we could use notebooks for coding in Python or Java. Configuring networks and running scripts to load data into the database were routine tasks that didn't pose significant challenges. The flexibility to use different languages for coding and the ability to process data using key-value pairs in Python made the setup adaptable. Once we received the source data, processing it in SparkSQL involved writing scripts to create dimension and fact tables, which became a standard part of our workflow. Setting up Spark SQL was reasonably quick, but sometimes we face performance issues, especially during data loading into the SQL Server data warehouse. Sequencing notebooks for efficient job runs is crucial, and managing complex tasks with multiple notebooks requires careful tracking. Exploring ways to optimize this process could be beneficial. However, once you are familiar with the database architecture and project tools, understanding and adapting to the system become more straightforward.

Read full review

Aria Amini

Data Engineer at Behsazan Mellat

We used Amber for Spark SQL's installation. We used Amber to install some IDs like Zeppelin and altering and Python. We used the tool and the Zeppelin ID. Installation is easy, but it can get complex if you want to use SparkSQL's cluster feature as well. But overall, the installation is not complex. It takes two or three days to deploy the solution if you want to install it on the Zeppelin ID and the Hadoop cluster. We needed one engineer to install and deploy the solution. Four engineers and some developers are working on the solution and doing development work in this environment. The solution just requires one person for maintenance because of the Amber framework.

Read full review

Spark SQL mindshare

As of April 2025, the mindshare of Spark SQL in the Hadoop category stands at 9.8%, down from 11.6% compared to the previous year, according to calculations based on PeerSpot user engagement data.

Hadoop

PeerAnalyst reports based on Spark SQL reviews

Type	Title	Date
Category	Hadoop	Apr 14, 2025	Download
Product	Reviews, tips, and advice from real users	Apr 14, 2025	Download
Comparison	Spark SQL vs Apache Spark	Apr 14, 2025	Download
Comparison	Spark SQL vs Cloudera Distribution for Hadoop	Apr 14, 2025	Download
Comparison	Spark SQL vs Amazon EMR	Apr 14, 2025	Download

Title	Rating	Mindshare	Recommending
Apache Spark	4.2	17.5%	90%	65 interviews Add to research
Amazon EMR	3.9	13.3%	86%	23 interviews Add to research

Valuable Features

"Spark SQL's efficiency in managing distributed data and its simplicity in expressing complex operations make it an essential part of our data pipeline."
"I find the Thrift connection valuable."
"One of Spark SQL's most beautiful features is running parallel queries to go through enormous data."

Room for Improvement

"In terms of improvement, the only thing that could be enhanced is the stability aspect of Spark SQL."
"I've experienced some incompatibilities when using the Delta Lake format."
"It would be useful if Spark SQL integrated with some data visualization tools."

Pricing

"We don't have to pay for licenses with this solution because we are working in a small market, and we rely on open-source because the budgets of projects are very small."
"We use the open-source version, so we do not have direct support from Apache."
"The on-premise solution is quite expensive in terms of hardware, setting up the cluster, memory, hardware and resources. It depends on the use case, but in our case with a shared cluster which is quite large, it is quite expensive."

These insights are based on the in-depth reviews provided by peers to help you make a better buying decision.

Download our Spark SQL Buyer's Guide for additional reliable information.

Review data by company size

By reviewers

By visitors reading reviews

Top industries

By visitors reading reviews

Financial Services Firm

22%

Computer Software Company

14%

Manufacturing Company

Retailer

University

Healthcare Company

Insurance Company

Energy/Utilities Company

Government

Educational Organization

Media Company

Real Estate/Law Firm

Non Profit

Comms Service Provider

Logistics Company

Performing Arts

Construction Company

Recreational Facilities/Services Company

Hospitality Company

Recruiting/Hr Firm

Legal Firm

Pharma/Biotech Company

Non Tech Company

Transportation Company

Compare Spark SQL with alternative products

Learn more about Spark SQL

Spark SQL customers

UC Berkeley AMPLab, Amazon, Alibaba Taobao, Kenshoo, Hitachi Solutions

Product Categories

Hadoop

Popular Comparisons

Apache Spark vs Spark SQL

Amazon EMR vs Spark SQL

HPE Ezmeral Data Fabric vs Spark SQL

IBM Db2 Big SQL vs Spark SQL

IBM Analytics Engine vs Spark SQL

Netezza Analytics vs Spark SQL

See all alternatives

Spark SQL reviews

Sort by:

Sahil Taneja

Principal Consultant/Manager at Tenzing

Verified user of Spark SQL

May 8, 2023

Easy to use and do not require a learning curve

Pros

"The team members don't have to learn a new language and can implement complex tasks very easily using only SQL."

Cons

"SparkUI could have more advanced versions of the performance and the queries and all."

What is our primary use case?

We are using PySpark for big data processing, like multiple competitors of stock. We process it in in-memory using data frames and Spark SQL. We are using it along with the database to process the big data, especially the special Azure data.

We are using PySpark. Databricks itself provides an environment that is pre-installed with Spark.

How has it helped my organization?

We are using Spark SQL in Databricks. There are three ways to code: Task Equals, PySpark, and Scala. Some team members are also using Spark SQL. They were new to Databricks and recent graduates, but they knew SQL and were able to code in Spark SQL. It's a great tool to use altogether because they were able to learn along with it. They were implementing some basic stuff.

*Disclosure: I am a real user, and this review is based on my own experience and opinions.

Read full review (340 words)

SurjitChoudhury

Data engineer at Cocos pt

Verified user of Spark SQL

Nov 24, 2023

Offers the flexibility to handle large-scale data processing

Pros

"Spark SQL's efficiency in managing distributed data and its simplicity in expressing complex operations make it an essential part of our data pipeline. "

Cons

"In terms of improvement, the only thing that could be enhanced is the stability aspect of Spark SQL. "

What is our primary use case?

I employ Spark SQL for various tasks. Initially, I gathered data from databases, SAP systems, and external sources via SFTP, storing it in blob storage. Using Spark SQL within Jupyter notebooks, I define and implement business logic for data processing. Our CI/CD process managed with Azure DevOps, oversees the execution of Spark SQL scripts, facilitating data loading into SQL Server. This structured data is then used by analytics teams, particularly in tools like Power BI, for thorough analysis and reporting. The seamless integration of Spark SQL in this workflow ensures efficient data processing and analysis, contributing to the success of our data-driven initiatives.

What is most valuable?

I find Spark SQL's seamless integration of SQL queries with Spark programs and its use of DataFrames and Datasets particularly valuable. While we mostly stick to traditional T-SQL, Spark SQL brings flexibility to handle large-scale data processing. The ability to write SQL queries, even with minor adjustments for functions like LICA, simplifies our data transformation. Although the syntax differs from traditional SQL, Spark SQL's efficiency in managing distributed data and its simplicity in expressing complex operations make it an essential part of our data pipeline.

*Disclosure: I am a real user, and this review is based on my own experience and opinions.

Read full review (656 words)

Buyer's Guide

Spark SQL

Download free report

Find out what your peers are saying about Spark SQL. Updated March 2025

847,646 professionals have used our research since 2012.

Aria Amini

Data Engineer at Behsazan Mellat

Verified user of Spark SQL

Aug 7, 2023

A great solution for handling large volumes of data by parallel queries

Pros

"One of Spark SQL's most beautiful features is running parallel queries to go through enormous data. "

Cons

"It would be useful if Spark SQL integrated with some data visualization tools."

What is our primary use case?

We have an HDFS environment for archiving data when there is an enormous volume of data, and the solution helps retrieve data from our HDFS archive. Developers use the solution for business analytics.

What is most valuable?

One of Spark SQL's most beautiful features is running parallel queries to go through enormous data. We can run queries in parallel and retrieve more data and results in aggregation. Spark does this faster than Python.

*Disclosure: I am a real user, and this review is based on my own experience and opinions.

Read full review (531 words)

Slaven Batnozic

CTO at Dokument IT d.o.o.

Verified user of Spark SQL

Aug 19, 2023

Product version discussed: 3.3.4

If implemented well, the solution is highly compatible and great for data analysis

Pros

"I find the Thrift connection valuable."

Cons

"I've experienced some incompatibilities when using the Delta Lake format."

What is our primary use case?

We used the solution for analytics of data and statistical reports from content management platforms.

What is most valuable?

I find the Thrift connection valuable.

*Disclosure: I am a real user, and this review is based on my own experience and opinions.

Read full review (574 words)

Mahdi Sharifmousavi

Lecturer at Amirkabir University of Technology

Verified user of Spark SQL

Sep 2, 2022

Incorporates regular SQL syntax within tasks and very useful for querying and depicting data

Pros

"Offers a variety of methods to design queries and incorporates the regular SQL syntax within tasks."

Cons

"There are many inconsistencies in syntax for the different querying tasks."

How has it helped my organization?

Spark SQL has enabled us to perform our data preparation tasks within our analytical code.

What is most valuable?

Spark SQL gives us a handful of methods to design queries based on its own syntax and also incorporates the regular SQL syntax within tasks. It's also a good tool for querying and depicting data. It's very good for us and it's helpful that we have access to a lot of good documentation.

*Disclosure: I am a real user, and this review is based on my own experience and opinions.

Read full review (362 words)

Srinivasan Sugumar

Analytics and Reporting Manager at a financial services firm with 1,001-5,000 employees

Verified user of Spark SQL

Mar 23, 2020

GUI could be improved. Useful for speedily processing big data.

Pros

"The speed of getting data."

Cons

"Anything to improve the GUI would be helpful."

What is our primary use case?

We do have some use cases, like analysis and risk-based use cases, that we've provided and prepared for companies in order to evaluate, but not many. The business units have so many things that we don't know how to help formulate into another tool and utilize as a use case. They also have so many requirements and costs.

I work for a financial institution, so every solution that they need to consider has to be on-premise.

I'm actually just evaluating and up scaling my skill sets with this solution right now.

What is most valuable?

The speed of getting data, as our TBs are big and it's a lot of data.

*Disclosure: My company has a business relationship with this vendor other than being a customer: Implementer

Read full review (407 words)

Lucas Dreyer

Data Engineer at BBD

Verified user of Spark SQL

Jan 6, 2023

Processing solution used for data engineering and transformation with the ability to process large datasets

Pros

"Certain data sets that are very large are very difficult to process with Pandas and Python libraries. Spark SQL has helped us a lot with that."

Cons

"It takes a bit of time to get used to using this solution versus Pandas as it has a steep learning curve. "

What is our primary use case?

We use this solution for data engineering, data transformation, repairing data for machine learning and doing queries. We have between 30 and 40 users making use of this solution.

How has it helped my organization?

Certain data sets that are very large are very difficult to process with Pandas and Python libraries. Spark SQL has helped us a lot with that.

*Disclosure: My company has a business relationship with this vendor other than being a customer:

Read full review (688 words)

DulalMali

Data Analytics Practice head at bse

Verified user of Spark SQL

Feb 11, 2020

An excellent solution that continues to mature but needs graphing capabilities

Pros

"Overall the solution is excellent."

Cons

"The solution needs to include graphing capabilities. Including financial charts would help improve everything overall."

What is our primary use case?

We primarily use the solution as our data warehouse. We use it for data science.

What is most valuable?

Overall the solution is excellent.

The solution is continuing to evolve and mature over time.

*Disclosure: I am a real user, and this review is based on my own experience and opinions.

Read full review (406 words)

Spark SQL Reviews

What is Spark SQL?

Featured Spark SQL reviews

Spark SQL mindshare

PeerAnalyst reports based on Spark SQL reviews

Valuable Features

Room for Improvement

Pricing

Review data by company size

Top industries

Compare Spark SQL with alternative products

Learn more about Spark SQL

Spark SQL customers

Related questions

Product Categories

Popular Comparisons

Spark SQL reviews

Pros

Cons

What is our primary use case?

How has it helped my organization?

Pros

Cons

What is our primary use case?

What is most valuable?

Pros

Cons

What is our primary use case?

What is most valuable?

Pros

Cons

What is our primary use case?

What is most valuable?

Pros

Cons

How has it helped my organization?

What is most valuable?

Pros

Cons

What is our primary use case?

What is most valuable?

Pros

Cons

What is our primary use case?

How has it helped my organization?

Pros

Cons

What is our primary use case?

What is most valuable?