Data Scientist at a financial services firm with 10,001+ employees
Real User
Top 10
2024-07-10T15:58:00Z
Jul 10, 2024
Apache Spark is my go-to solution for processing large-scale datasets. I would recommend it 100%. One of the main reasons is its ease of use. You can start using it on your laptop without any extra infrastructure, and then you can take that same code and run it anywhere else, including on the cloud. You're not locked in by any vendor, which is a significant advantage. Overall, I rate the solution a nine out of ten as a big data processing engine.
Apache Spark is a good product for processing large volumes of data compared to other distributed systems. It provides efficient integration with Hadoop and other platforms. I rate it a ten out of ten.
The tool is used for real-time data analytics as it is very powerful and reliable. The code that you write with Apache Spark provides stability. There are many bugs that can appear according to the code that you use, which could be Java or Scala. So this is amazing. Apache Spark is very reliable, powerful, and fast as an engine. When compared with another competitor like MapReduce, Apache Spark performs 100 times better than MapReduce. The monitoring part of the product is good. The product offers clusters that are resilient and can run into multiple nodes. The tool can run with multiple clusters. The integration capabilities of the product with other platforms to improve our company's workflow are good. In terms of the improvements in the product in the data analysis area, new libraries have been launched to support AI and machine learning. My company is able to process huge datasets with Apache Spark. There is a huge value added to the organization because of the tool's ability to process huge datasets. I rate the overall solution a nine out of ten.
If your use case involves real-time applications frequently changing columns or data frames, then Spark is a fantastic option for you. However, if you have a batch process and don't have a structural data analysis, I would suggest avoiding it. The high cost of cloud infrastructure combined with Apache Spark can be a significant burden in such scenarios. Overall, I would rate the solution a nine out of ten.
Quantitative Developer at a marketing services firm with 11-50 employees
Real User
Top 20
2023-07-06T10:55:23Z
Jul 6, 2023
I would recommend understanding the use case better. Only if it fits your use case, then go for it. But it is a great tool. Overall, I would rate Apache Spark an eight out of ten.
We are well versed in Spark, the version, the internal structure of Spark, and we know what exactly Spark is doing. The solution cannot be easier. Everything cannot be made simpler because it involves core data, computer science, pro-engineering, and not many people are actually aware of it. I rate Apache Spark a six out of ten.
Chief Data-strategist and Director at Theworkshop.es
Real User
Top 10
2021-08-18T14:51:07Z
Aug 18, 2021
I have the solution installed on my computer and on our servers. You can use it on-premises or as a SaaS. I'd rate the solution at a nine out of ten. I've been very pleased with its capabilities. I would recommend the solution for the people who need to deploy projects with streaming. If you have many different sources or different types of data, and you need to put everything in the same place - like a data lake - Spark, at this moment, has the right tools. It's an important solution for data science, for data detectors. You can put all of the information in one place with Spark.
Senior Solutions Architect at a retailer with 10,001+ employees
Real User
2021-03-27T15:39:24Z
Mar 27, 2021
I would recommend Apache Spark to new users, but it depends on the use case. Sometimes, it's not the best solution. On a scale from one to ten, I would give Apache Spark a ten.
I would definitely recommend Spark. It is a great product. I like Spark a lot, and most of the features have been quite good. Its initial learning curve is a bit high, but as you learn it, it becomes very easy. I would rate Apache Spark an eight out of ten.
I would advise planning well before implementing this solution. In enterprise corporations like ours, there are a lot of policies. You should first find out your needs, and after that, you or your team should set it up based on your needs. If your needs change during development because of the business requirements, it will be very difficult. If you are clear about your needs, it is easier to set it up. If you know how Spark is used in your project, you have to define firewall rules and cluster needs. When you set up Spark, it should be ready for people's usage, especially for remote job execution. I would rate Apache Spark a nine out of ten.
We're customers and also partners with Apache. While we are on version 2.6, we are considering upgrading to version 3.0. I'd rate the solution nine out of ten. It works very well for us and suits our purposes almost perfectly.
I would say for some use case we don't have to go to Apache spark and it can be implemented using ordinary python,go or Java application. For some use cases if leveraging the usage of Apache Spark gives better performance and reduction of time we can go for Apache Spark. I would rate Apache spark 9 out of 10 for use cases that require it. I would advice using already cloud services for implementing Apache Spark.
Lead Consultant at a tech services company with 51-200 employees
Consultant
2020-01-29T11:22:00Z
Jan 29, 2020
The advice that I would give to someone considering this solution is that the quality of data has key streaming capabilities like velocity. This means how quickly you are going to refer to the data. These things matter by designing the solution. We need to take these things out. I would rate Apache Spark an eight out of ten. To make it a ten they should improve the speed. The data storage capacity means we can inject somewhere in the user database in more efficient ways.
Technical Consultant at a tech services company with 1-10 employees
Consultant
2019-12-23T07:05:00Z
Dec 23, 2019
On a scale of 1 to 10, I'd put it at an eight. To make it a perfect 10 I'd like to see an improved configuration bot. Sometimes it is a nightmare on Linux trying to figure out what happened on the configuration and back-end. So I think installation and configuration with some other tools. We are technical people, we could figure it out, but if aspects like that were improved then other people who are less technical would use it and it would be more adaptable to the end-user.
We use both on-premises and public and private cloud deployment models. We're partners with Databricks. I'm a consultant. Our company works for large enterprises such as banks and energy companies. 17 of our workers use Apache Spark. With the cloud, there are many companies that integrate Spark. Most projects in big data around the world use Spark, indirectly or directly. I'd rate the solution eight out of ten.
Senior Consultant & Training at a tech services company with 51-200 employees
Consultant
2019-10-13T05:48:00Z
Oct 13, 2019
The work that we are doing with this solution is quite common and is very easy to do. My advice for anybody who is implementing this solution is to look at their needs and then look at the community. Normally, there are a lot of people who have already done what you need. So, even without experience, it is quite simple to do a lot of things. I would rate this solution a nine out of ten.
Principal Architect at a financial services firm with 1,001-5,000 employees
Real User
2019-07-10T12:01:00Z
Jul 10, 2019
I would recommend the solution. I would rate it an eight or nine out of 10. For some areas, I would give it ten but I cannot use some parts. If you are going to use it for a consumer then I would be able to recommend it and you should go ahead. It doesn't work for me as I have different clients and different engagements.
Spark provides programmers with an application programming interface centered on a data structure called the resilient distributed dataset (RDD), a read-only multiset of data items distributed over a cluster of machines, that is maintained in a fault-tolerant way. It was developed in response to limitations in the MapReduce cluster computing paradigm, which forces a particular linear dataflowstructure on distributed programs: MapReduce programs read input data from disk, map a function...
I'd rate the solution eight out of ten.
Apache Spark is my go-to solution for processing large-scale datasets. I would recommend it 100%. One of the main reasons is its ease of use. You can start using it on your laptop without any extra infrastructure, and then you can take that same code and run it anywhere else, including on the cloud. You're not locked in by any vendor, which is a significant advantage. Overall, I rate the solution a nine out of ten as a big data processing engine.
Apache Spark is a good product for processing large volumes of data compared to other distributed systems. It provides efficient integration with Hadoop and other platforms. I rate it a ten out of ten.
The tool is used for real-time data analytics as it is very powerful and reliable. The code that you write with Apache Spark provides stability. There are many bugs that can appear according to the code that you use, which could be Java or Scala. So this is amazing. Apache Spark is very reliable, powerful, and fast as an engine. When compared with another competitor like MapReduce, Apache Spark performs 100 times better than MapReduce. The monitoring part of the product is good. The product offers clusters that are resilient and can run into multiple nodes. The tool can run with multiple clusters. The integration capabilities of the product with other platforms to improve our company's workflow are good. In terms of the improvements in the product in the data analysis area, new libraries have been launched to support AI and machine learning. My company is able to process huge datasets with Apache Spark. There is a huge value added to the organization because of the tool's ability to process huge datasets. I rate the overall solution a nine out of ten.
I rate the overall solution a ten out of ten.
If your use case involves real-time applications frequently changing columns or data frames, then Spark is a fantastic option for you. However, if you have a batch process and don't have a structural data analysis, I would suggest avoiding it. The high cost of cloud infrastructure combined with Apache Spark can be a significant burden in such scenarios. Overall, I would rate the solution a nine out of ten.
I would give it a rating of seven out of ten, which, by my standards, is quite high.
I would recommend Apache Spark to other users. Overall, I rate Apache Spark an eight out of ten.
Overall, I rate the product more than eight out of ten.
I would recommend understanding the use case better. Only if it fits your use case, then go for it. But it is a great tool. Overall, I would rate Apache Spark an eight out of ten.
This is a good solution for big data use cases and I rate it eight out of 10.
I can recommend the product. It's a nice system for batch processing huge data. I'd rate the solution eight out of ten.
I rate Apache Spark an eight out of ten.
I rate Apache Spark an eight out of ten.
I would rate Apache Spark eight out of ten.
Spark can handle small to huge data and is suitable for any size of company. I would rate Spark as eight out of ten.
We are well versed in Spark, the version, the internal structure of Spark, and we know what exactly Spark is doing. The solution cannot be easier. Everything cannot be made simpler because it involves core data, computer science, pro-engineering, and not many people are actually aware of it. I rate Apache Spark a six out of ten.
I have the solution installed on my computer and on our servers. You can use it on-premises or as a SaaS. I'd rate the solution at a nine out of ten. I've been very pleased with its capabilities. I would recommend the solution for the people who need to deploy projects with streaming. If you have many different sources or different types of data, and you need to put everything in the same place - like a data lake - Spark, at this moment, has the right tools. It's an important solution for data science, for data detectors. You can put all of the information in one place with Spark.
I would recommend Apache Spark to new users, but it depends on the use case. Sometimes, it's not the best solution. On a scale from one to ten, I would give Apache Spark a ten.
I would definitely recommend Spark. It is a great product. I like Spark a lot, and most of the features have been quite good. Its initial learning curve is a bit high, but as you learn it, it becomes very easy. I would rate Apache Spark an eight out of ten.
I would advise planning well before implementing this solution. In enterprise corporations like ours, there are a lot of policies. You should first find out your needs, and after that, you or your team should set it up based on your needs. If your needs change during development because of the business requirements, it will be very difficult. If you are clear about your needs, it is easier to set it up. If you know how Spark is used in your project, you have to define firewall rules and cluster needs. When you set up Spark, it should be ready for people's usage, especially for remote job execution. I would rate Apache Spark a nine out of ten.
We're customers and also partners with Apache. While we are on version 2.6, we are considering upgrading to version 3.0. I'd rate the solution nine out of ten. It works very well for us and suits our purposes almost perfectly.
I would say for some use case we don't have to go to Apache spark and it can be implemented using ordinary python,go or Java application. For some use cases if leveraging the usage of Apache Spark gives better performance and reduction of time we can go for Apache Spark. I would rate Apache spark 9 out of 10 for use cases that require it. I would advice using already cloud services for implementing Apache Spark.
I would rate this solution an eight out of ten.
I would rate it a nine out of ten.
The advice that I would give to someone considering this solution is that the quality of data has key streaming capabilities like velocity. This means how quickly you are going to refer to the data. These things matter by designing the solution. We need to take these things out. I would rate Apache Spark an eight out of ten. To make it a ten they should improve the speed. The data storage capacity means we can inject somewhere in the user database in more efficient ways.
On a scale of 1 to 10, I'd put it at an eight. To make it a perfect 10 I'd like to see an improved configuration bot. Sometimes it is a nightmare on Linux trying to figure out what happened on the configuration and back-end. So I think installation and configuration with some other tools. We are technical people, we could figure it out, but if aspects like that were improved then other people who are less technical would use it and it would be more adaptable to the end-user.
We use both on-premises and public and private cloud deployment models. We're partners with Databricks. I'm a consultant. Our company works for large enterprises such as banks and energy companies. 17 of our workers use Apache Spark. With the cloud, there are many companies that integrate Spark. Most projects in big data around the world use Spark, indirectly or directly. I'd rate the solution eight out of ten.
The work that we are doing with this solution is quite common and is very easy to do. My advice for anybody who is implementing this solution is to look at their needs and then look at the community. Normally, there are a lot of people who have already done what you need. So, even without experience, it is quite simple to do a lot of things. I would rate this solution a nine out of ten.
I would rate this solution eight out of 10.
I would recommend the solution. I would rate it an eight or nine out of 10. For some areas, I would give it ten but I cannot use some parts. If you are going to use it for a consumer then I would be able to recommend it and you should go ahead. It doesn't work for me as I have different clients and different engagements.