What is your primary use case for Spark SQL?

Spark SQL is a Spark module for structured data processing. Unlike the basic Spark RDD API, the interfaces provided by Spark SQL provide Spark with more information about the structure of both the data and the computation being performed. There are several ways to interact with Spark SQL including SQL and the Dataset API. When computing a result the same execution engine is used, independent of which API/language you are using to express the computation. This unification means that developers...

Download Spark SQL Report Read more

Related Q&As

Aug 18, 2023

What is your experience regarding pricing and costs for Spark SQL?

Nov 23, 2023

What do you like most about Spark SQL?

SurjitChoudhury Data engineer at Cocos pt · Answer 1 · 2023-11-23T15:19:35Z

I employ Spark SQL for various tasks. Initially, I gathered data from databases, SAP systems, and external sources via SFTP, storing it in blob storage. Using Spark SQL within Jupyter notebooks, I define and implement business logic for data processing. Our CI/CD process managed with Azure DevOps, oversees the execution of Spark SQL scripts, facilitating data loading into SQL Server. This structured data is then used by analytics teams, particularly in tools like Power BI, for thorough analysis and reporting. The seamless integration of Spark SQL in this workflow ensures efficient data processing and analysis, contributing to the success of our data-driven initiatives.

Slaven Batnozic CTO at Dokument IT d.o.o. · Answer 2 · 2023-08-18T08:37:21Z

We used the solution for analytics of data and statistical reports from content management platforms.

Aria Amini Data Engineer at Behsazan Mellat · Answer 3 · 2023-07-26T11:55:00Z

We have an HDFS environment for archiving data when there is an enormous volume of data, and the solution helps retrieve data from our HDFS archive. Developers use the solution for business analytics.

Sahil Taneja Principal Consultant/Manager at Tenzing · Answer 4 · 2023-05-05T08:54:14Z

We are using PySpark for big data processing, like multiple competitors of stock. We process it in in-memory using data frames and Spark SQL. We are using it along with the database to process the big data, especially the special Azure data. We are using PySpark. Databricks itself provides an environment that is pre-installed with Spark.

Lucas Dreyer Data Engineer at BBD · Answer 5 · 2023-01-04T13:37:06Z

We use this solution for data engineering, data transformation, repairing data for machine learning and doing queries. We have between 30 and 40 users making use of this solution.

score 0 · Answer 6 · 2022-11-22T13:27:47Z

Our company uses the solution to create pipelines and data sets. The ETL process transforms the data and certain written aggregations convert the raw data to data sets. The data sets are then exported to tables for dashboards.

reviewer1724670 Engineering Manager/Solution architect at Provectus · Answer 7 · 2021-12-02T15:07:38Z

The primary use case of this solution is to function within a distributed ecosystem. Spark is part of EMR, a Hadoop distribution, and is one of the tools in the ecosystem. You are not working with Hadoop in a vacuum—you leverage Spark, Hive, HBase—because it is just a distributed ecosystem. It has no value within itself. This solution can be deployed both on the cloud and on Cloudera distributions.

reviewer1488372 Associate Manager at a consultancy with 501-1,000 employees · Answer 8 · 2021-05-29T10:04:10Z

reviewer1488372

Associate Manager at a consultancy with 501-1,000 employees

Real User

May 29, 2021

I am using this solution for data validation and writing queries.

score 0 · Answer 9 · 2020-09-27T04:10:00Z

We use it to gather all the transaction data. We have Hadoop and Spark in our system, and we use some easy process flows for transport.

Piotr Kalanski Cloud Team Leader at TCL · Answer 10 · 2020-04-26T06:32:00Z

PK

Piotr Kalanski

Cloud Team Leader at TCL

Real User

Apr 26, 2020

Our primary use case is for building a data pipeline and data analytics.

score 0 · Answer 11 · 2020-03-18T06:06:00Z

We do have some use cases, like analysis and risk-based use cases, that we've provided and prepared for companies in order to evaluate, but not many. The business units have so many things that we don't know how to help formulate into another tool and utilize as a use case. They also have so many requirements and costs. I work for a financial institution, so every solution that they need to consider has to be on-premise. I'm actually just evaluating and up scaling my skill sets with this solution right now.

DulalMali Data Analytics Practice head at bse · Answer 12 · 2020-02-09T08:17:05Z

DM

DulalMali

Data Analytics Practice head at bse

Real User

Feb 9, 2020

We primarily use the solution as our data warehouse. We use it for data science.

score 0 · Answer 13 · 2019-07-16T05:40:00Z

The primary use is to process big data. We were connecting into and we were applying sentiment analysis via hardware.