I recommend Spark SQL, but I will need to see what the results will be of our evaluation of Dremio. I'm especially expecting good performance because of the reflection mechanisms, which are actually materials used. But the open question is issues with the refresh rate. I don't know how bad or good that is. I rate Spark SQL a ten out of ten with the correct implementation.
If the user data has a big volume of data, I think they should use PySpark, but for scenarios where they use a medium amount of data, they should not use PySpark because they have some overheads. I rate Spark SQL a nine out of ten.
Training is quite important to get users up to scratch with Sparks SQL and Spark. Planning is needed in terms of training and skillsets. In terms of the typical DevOps MLOps deployment with pipelines, this training is particularly important. Otherwise you may end up with lots of functionality and queries that are difficult to change, deploy or maintain. I would rate this solution an eight out of ten. In terms of scalability, it is very useful.
I would rate Spark SQL a nine out of ten. My advice would be to read Databricks books about Spark. It's a good source of knowledge. In the next update, we'd like to see better performance for small points of data. It is possible but there are better tools that are faster and cheaper.
Analytics and Reporting Manager at a financial services firm with 1,001-5,000 employees
Real User
2020-03-18T06:06:00Z
Mar 18, 2020
We will have a lot of big data, which is why we need it. Otherwise, the solution is not needed. The solution really depends on the size of your data, its complexity, and the analysis that you are doing. Spark is good, but it is not mandatory. Since I don't have experience in production with the solution, the best I can rate it now is a five (out of 10).
We use both the on-premises and cloud deployment models. We have a relationship with Cloudera and use their distribution channels. We don't have a relationship with Apache. Spark SQL is a good product. However, users need to have the capability of implementing the correct tools and efficiencies. I'd rate the solution seven out of ten.
Project Manager - Senior Software Engineer at a tech services company with 11-50 employees
Real User
2019-07-16T05:40:00Z
Jul 16, 2019
We've just started using this solution. We were using it until recently on a research basis, just to measure the performance, the cost, and so on and so forth. Many things could be improved, but are okay up till now, I'm happy with. I would recommend the product. I would rate this solution eight out of ten.
Spark SQL is a Spark module for structured data processing. Unlike the basic Spark RDD API, the interfaces provided by Spark SQL provide Spark with more information about the structure of both the data and the computation being performed. There are several ways to interact with Spark SQL including SQL and the Dataset API. When computing a result the same execution engine is used, independent of which API/language you are using to express the computation. This unification means that developers...
Overall, I would rate Spark SQL as a seven out of ten.
I recommend Spark SQL, but I will need to see what the results will be of our evaluation of Dremio. I'm especially expecting good performance because of the reflection mechanisms, which are actually materials used. But the open question is issues with the refresh rate. I don't know how bad or good that is. I rate Spark SQL a ten out of ten with the correct implementation.
If the user data has a big volume of data, I think they should use PySpark, but for scenarios where they use a medium amount of data, they should not use PySpark because they have some overheads. I rate Spark SQL a nine out of ten.
It's pretty good to use in the initial phases. Overall, I would rate the solution an eight out of ten.
Training is quite important to get users up to scratch with Sparks SQL and Spark. Planning is needed in terms of training and skillsets. In terms of the typical DevOps MLOps deployment with pipelines, this training is particularly important. Otherwise you may end up with lots of functionality and queries that are difficult to change, deploy or maintain. I would rate this solution an eight out of ten. In terms of scalability, it is very useful.
The solution is very similar to the generic Spark and SQL language. I rate the solution an eight out of ten.
I recommend this solution. Spark provides good, clear documentation that is well organized.
I rate this solution an eight out of ten and would recommend it to others.
I rate Spark SQL a ten out of ten.
Being a new user, I would rate Spark SQL a four out of ten.
I would rate Spark SQL a nine out of ten. My advice would be to read Databricks books about Spark. It's a good source of knowledge. In the next update, we'd like to see better performance for small points of data. It is possible but there are better tools that are faster and cheaper.
We will have a lot of big data, which is why we need it. Otherwise, the solution is not needed. The solution really depends on the size of your data, its complexity, and the analysis that you are doing. Spark is good, but it is not mandatory. Since I don't have experience in production with the solution, the best I can rate it now is a five (out of 10).
We use both the on-premises and cloud deployment models. We have a relationship with Cloudera and use their distribution channels. We don't have a relationship with Apache. Spark SQL is a good product. However, users need to have the capability of implementing the correct tools and efficiencies. I'd rate the solution seven out of ten.
We've just started using this solution. We were using it until recently on a research basis, just to measure the performance, the cost, and so on and so forth. Many things could be improved, but are okay up till now, I'm happy with. I would recommend the product. I would rate this solution eight out of ten.