I employ Spark SQL for various tasks. Initially, I gathered data from databases, SAP systems, and external sources via SFTP, storing it in blob storage. Using Spark SQL within Jupyter notebooks, I define and implement business logic for data processing. Our CI/CD process managed with Azure DevOps, oversees the execution of Spark SQL scripts, facilitating data loading into SQL Server. This structured data is then used by analytics teams, particularly in tools like Power BI, for thorough analysis and reporting. The seamless integration of Spark SQL in this workflow ensures efficient data processing and analysis, contributing to the success of our data-driven initiatives.
We have an HDFS environment for archiving data when there is an enormous volume of data, and the solution helps retrieve data from our HDFS archive. Developers use the solution for business analytics.
We are using PySpark for big data processing, like multiple competitors of stock. We process it in in-memory using data frames and Spark SQL. We are using it along with the database to process the big data, especially the special Azure data. We are using PySpark. Databricks itself provides an environment that is pre-installed with Spark.
We use this solution for data engineering, data transformation, repairing data for machine learning and doing queries. We have between 30 and 40 users making use of this solution.
Our company uses the solution to create pipelines and data sets. The ETL process transforms the data and certain written aggregations convert the raw data to data sets. The data sets are then exported to tables for dashboards.
Engineering Manager/Solution architect at Provectus
Vendor
2021-12-02T15:07:38Z
Dec 2, 2021
The primary use case of this solution is to function within a distributed ecosystem. Spark is part of EMR, a Hadoop distribution, and is one of the tools in the ecosystem. You are not working with Hadoop in a vacuum—you leverage Spark, Hive, HBase—because it is just a distributed ecosystem. It has no value within itself. This solution can be deployed both on the cloud and on Cloudera distributions.
Analytics and Reporting Manager at a financial services firm with 1,001-5,000 employees
Real User
2020-03-18T06:06:00Z
Mar 18, 2020
We do have some use cases, like analysis and risk-based use cases, that we've provided and prepared for companies in order to evaluate, but not many. The business units have so many things that we don't know how to help formulate into another tool and utilize as a use case. They also have so many requirements and costs. I work for a financial institution, so every solution that they need to consider has to be on-premise. I'm actually just evaluating and up scaling my skill sets with this solution right now.
Spark SQL is a Spark module for structured data processing. Unlike the basic Spark RDD API, the interfaces provided by Spark SQL provide Spark with more information about the structure of both the data and the computation being performed. There are several ways to interact with Spark SQL including SQL and the Dataset API. When computing a result the same execution engine is used, independent of which API/language you are using to express the computation. This unification means that developers...
I employ Spark SQL for various tasks. Initially, I gathered data from databases, SAP systems, and external sources via SFTP, storing it in blob storage. Using Spark SQL within Jupyter notebooks, I define and implement business logic for data processing. Our CI/CD process managed with Azure DevOps, oversees the execution of Spark SQL scripts, facilitating data loading into SQL Server. This structured data is then used by analytics teams, particularly in tools like Power BI, for thorough analysis and reporting. The seamless integration of Spark SQL in this workflow ensures efficient data processing and analysis, contributing to the success of our data-driven initiatives.
We used the solution for analytics of data and statistical reports from content management platforms.
We have an HDFS environment for archiving data when there is an enormous volume of data, and the solution helps retrieve data from our HDFS archive. Developers use the solution for business analytics.
We are using PySpark for big data processing, like multiple competitors of stock. We process it in in-memory using data frames and Spark SQL. We are using it along with the database to process the big data, especially the special Azure data. We are using PySpark. Databricks itself provides an environment that is pre-installed with Spark.
We use this solution for data engineering, data transformation, repairing data for machine learning and doing queries. We have between 30 and 40 users making use of this solution.
Our company uses the solution to create pipelines and data sets. The ETL process transforms the data and certain written aggregations convert the raw data to data sets. The data sets are then exported to tables for dashboards.
The primary use case of this solution is to function within a distributed ecosystem. Spark is part of EMR, a Hadoop distribution, and is one of the tools in the ecosystem. You are not working with Hadoop in a vacuum—you leverage Spark, Hive, HBase—because it is just a distributed ecosystem. It has no value within itself. This solution can be deployed both on the cloud and on Cloudera distributions.
I am using this solution for data validation and writing queries.
We use it to gather all the transaction data. We have Hadoop and Spark in our system, and we use some easy process flows for transport.
Our primary use case is for building a data pipeline and data analytics.
We do have some use cases, like analysis and risk-based use cases, that we've provided and prepared for companies in order to evaluate, but not many. The business units have so many things that we don't know how to help formulate into another tool and utilize as a use case. They also have so many requirements and costs. I work for a financial institution, so every solution that they need to consider has to be on-premise. I'm actually just evaluating and up scaling my skill sets with this solution right now.
We primarily use the solution as our data warehouse. We use it for data science.
The primary use is to process big data. We were connecting into and we were applying sentiment analysis via hardware.