Try our new research platform with insights from 80,000+ expert users
Senior Analyst/ Customer Business and Insights Specialist at a tech services company with 501-1,000 employees
Real User
Leaderboard
Analytics are easy because data is contained within each use case
Pros and Cons
  • "The solution is easy to understand if you have basic knowledge of SQL commands."
  • "It would be beneficial for aggregate functions to include a code block or toolbox that explains its calculations or supported conditional statements."

What is our primary use case?

Our company uses the solution to create pipelines and data sets. The ETL process transforms the data and certain written aggregations convert the raw data to data sets. The data sets are then exported to tables for dashboards. 

What is most valuable?

The solution is easy to understand if you have basic knowledge of SQL commands.

Projects sit within the Spark scope and there are multiple options for data sets such as closed, private, or public. 

It is easy to perform analytics because data is contained within each use case. For example, you request data for a particular use case, receive the data link, and import it for analytics. 

What needs improvement?

It would be beneficial for aggregate functions to include a code block or toolbox that explains calculations or supported conditional statements. Multiple functions come within an aggregate so it is important to understand them. When you are trying to do something new, it would be easier and quite unique to get information within the solution rather than having to search the web. 

For example, once you select an aggregate it tells you what type of functions the solution can perform and includes a code block explaining its calculations. Or, a certain conditional statement gives you a second option or explains other types of statements the solution performs as part of a rule-level function. 

For how long have I used the solution?

I have been using the solution for fourteen months. 

Buyer's Guide
Spark SQL
November 2024
Learn what your peers think about Spark SQL. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
824,067 professionals have used our research since 2012.

What do I think about the stability of the solution?

The solution is stable. 

What do I think about the scalability of the solution?

The scalability depends on administrative rights. Every use case has certain allocated resources so a use case demanding scalability or extensive use can have additional resources allocated to it. 

How are customer service and support?

I have not experienced any issues with the solution so have not needed technical support. 

Which solution did I use previously and why did I switch?

I have been using SQL to extract data throughout my five-year career. 

How was the initial setup?

The setup is very straightforward so I rate it a ten out of ten. 

What about the implementation team?

We implemented the solution in-house. 

What's my experience with pricing, setup cost, and licensing?

The solution is bundled with Palantir Foundry at no extra charge. 

Which other solutions did I evaluate?

Our company gives us the freedom to use Python R, PySpark, or SQL languages so we have many tools available. Our team includes 17 developers and 25% of them use the solution. 

The solution is way better than Oracle SQL because Oracle takes a lot of effort to understand and use. 

The solution is similar to the format of MS SQL. With MS, there are defined data sources that place restrictions on what you are supposed to use. Sometimes we had to make sure we had a way through the restrictions. For example, if we didn't have access to a physical table then we had to create a duplicate instance or view of it. We could see the values but couldn't manipulate them because we didn't have access to the physical table. The effect of MS restrictions is based on the complexity of a project and any privacy-related data constraints.

For the solution, use cases sit within the Spark scope so you get multiple options for creating them. You can individually set each use case as closed, private, or public. You can run analytics for each use case because the data is contained within it. This process is much easier when compared to Oracle SQL or MS SQL. 

What other advice do I have?

The solution is very similar to the generic Spark and SQL language. 

I rate the solution an eight out of ten. 

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Cloud Team Leader at TCL
Real User
Enables us to build a data pipeline and has good performance
Pros and Cons
  • "The performance is one of the most important features. It has an API to process the data in a functional manner."
  • "In the next update, we'd like to see better performance for small points of data. It is possible but there are better tools that are faster and cheaper."

What is our primary use case?

Our primary use case is for building a data pipeline and data analytics. 

What is most valuable?

The performance is one of the most important features. It has an API to process the data in a functional manner. 

What needs improvement?

I would like to have the ability to process data without the overhead. To use the same API to process both terabytes data and be able to process one GB of data. 

For how long have I used the solution?

I have been using Spark SQL for around four years. 

What do I think about the stability of the solution?

It is very stable.

What do I think about the scalability of the solution?

It is scalable. I use it on and off. I use it mostly daily. 

How was the initial setup?

From an infrastructure perspective, it was easy for us to set up because we used some cloud services. But on-premise requires more setup. There is a learning curve. If you're not a programmer there is a learning curve. It requires more effort to learn more complex steps. 

I deployed it by myself. We use cloud so we are able to do it. 

The amount of people required for deployment will depend. One person is enough for AWS but not in other places. 

If you know how to do it, the deployment can be done in minutes. 

What other advice do I have?

I would rate Spark SQL a nine out of ten. 

My advice would be to read Databricks books about Spark. It's a good source of knowledge. 

In the next update, we'd like to see better performance for small points of data. It is possible but there are better tools that are faster and cheaper. 

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Spark SQL
November 2024
Learn what your peers think about Spark SQL. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
824,067 professionals have used our research since 2012.
reviewer1724670 - PeerSpot reviewer
Engineering Manager/Solution architect at a computer software company with 201-500 employees
Vendor
Useful tool within a distributed ecosystem
Pros and Cons
  • "This solution is useful to leverage within a distributed ecosystem."
  • "This solution could be improved by adding monitoring and integration for the EMR."

What is our primary use case?

The primary use case of this solution is to function within a distributed ecosystem. Spark is part of EMR, a Hadoop distribution, and is one of the tools in the ecosystem. You are not working with Hadoop in a vacuum—you leverage Spark, Hive, HBase—because it is just a distributed ecosystem. It has no value within itself. 

This solution can be deployed both on the cloud and on Cloudera distributions. 

What is most valuable?

This solution is useful to leverage within a distributed ecosystem. 

What needs improvement?

This solution could be improved by adding monitoring and integration for the EMR. 

For how long have I used the solution?

We have been working with Spark SQL for a few years. We are an outsourcing and consulting company, so it's not for our use—we mostly work with clients. 

What do I think about the stability of the solution?

This solution is stable. 

What do I think about the scalability of the solution?

This solution is scalable. 

How was the initial setup?

The installation is straightforward because it's a cloud-based solution. 

What about the implementation team?

We implement this solution for customers ourselves. 

What's my experience with pricing, setup cost, and licensing?

There is no license or subscription for this solution. 

What other advice do I have?

I rate this solution an eight out of ten and would recommend it to others. 

Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
PeerSpot user
reviewer1427205 - PeerSpot reviewer
Corporate Sales at a financial services firm with 10,001+ employees
Real User
It is stable, but its partitioning feature isn't that easy to use
Pros and Cons
  • "It is a stable solution."
  • "Being a new user, I am not able to find out how to partition it correctly. I probably need more information or knowledge. In other database solutions, you can easily optimize all partitions. I haven't found a quicker way to do that in Spark SQL. It would be good if you don't need a partition here, and the system automatically partitions in the best way. They can also provide more educational resources for new users."

What is our primary use case?

We use it to gather all the transaction data. We have Hadoop and Spark in our system, and we use some easy process flows for transport. 

What is most valuable?

It is a stable solution. 

What needs improvement?

Being a new user, I am not able to find out how to partition it correctly. I probably need more information or knowledge. In other database solutions, you can easily optimize all partitions. I haven't found a quicker way to do that in Spark SQL. It would be good if you don't need a partition here, and the system automatically partitions in the best way. They can also provide more educational resources for new users.

For how long have I used the solution?

I have been using this solution for two months.

What do I think about the scalability of the solution?

Its scalability is okay. We are a big organization. 

What other advice do I have?

Being a new user, I would rate Spark SQL a four out of ten. 

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer1488372 - PeerSpot reviewer
Associate Manager at a consultancy with 501-1,000 employees
Real User
Easy to use, reliable, and useful data validation
Pros and Cons
  • "Data validation and ease of use are the most valuable features."
  • "There should be better integration with other solutions."

What is our primary use case?

I am using this solution for data validation and writing queries.

What is most valuable?

Data validation and ease of use are the most valuable features.

What needs improvement?

There should be better integration with other solutions.

For how long have I used the solution?

I have been using the solution for approximately two years.

What do I think about the stability of the solution?

The solution has been stable.

What do I think about the scalability of the solution?

I have found the solution to be scalable. We have 20 people using the solution in my organization and we plan to increase usage.

What's my experience with pricing, setup cost, and licensing?

The solution is open-sourced and free.

What other advice do I have?

I rate Spark SQL a ten out of ten.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
it_user986637 - PeerSpot reviewer
Project Manager - Senior Software Engineer at a tech services company with 11-50 employees
Real User
A good stable and scalable solution for processing big data
Pros and Cons
  • "The stability was fine. It behaved as expected."
  • "In the next release, maybe the visualization of some command-line features could be added."

What is our primary use case?

The primary use is to process big data. We were connecting into and we were applying sentiment analysis via hardware.

What needs improvement?

In the next release, maybe the visualization of some command-line features could be added.

For how long have I used the solution?

I've been using the solution for two to three weeks.

What do I think about the stability of the solution?

The stability was fine. It behaved as expected.

What do I think about the scalability of the solution?

The scalability of the solution is good.

How are customer service and technical support?

Technical support has been fine.

Which solution did I use previously and why did I switch?

We previously used Apache Hadoop.

How was the initial setup?

The initial setup was fine. If somebody knows what to expect it's okay.

What other advice do I have?

We've just started using this solution. We were using it until recently on a research basis, just to measure the performance, the cost, and so on and so forth. Many things could be improved, but are okay up till now, I'm happy with. I would recommend the product.

I would rate this solution eight out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Download our free Spark SQL Report and get advice and tips from experienced pros sharing their opinions.
Updated: November 2024
Product Categories
Hadoop
Buyer's Guide
Download our free Spark SQL Report and get advice and tips from experienced pros sharing their opinions.