Try our new research platform with insights from 80,000+ expert users
Apache Spark Logo

Apache Spark pros and cons

Vendor: Apache
4.2 out of 5
Badge Ranked 1
1,179 followers
Post review
 

Apache Spark Pros review quotes

VM
Feb 20, 2024
The solution is scalable.
reviewer2150616 - PeerSpot reviewer
Aug 5, 2024
The product's initial setup phase was easy.
reviewer879201 - PeerSpot reviewer
Dec 23, 2019
I feel the streaming is its best feature.
Learn what your peers think about Apache Spark. Get advice and tips from experienced pros sharing their opinions. Updated: January 2025.
831,265 professionals have used our research since 2012.
SurjitChoudhury - PeerSpot reviewer
Feb 20, 2024
Now, when we're tackling sentiment analysis using NLP technologies, we deal with unstructured data—customer chats, feedback on promotions or demos, and even media like images, audio, and video files. For processing such data, we rely on PySpark. Beneath the surface, Spark functions as a compute engine with in-memory processing capabilities, enhancing performance through features like broadcasting and caching. It's become a crucial tool, widely adopted by 90% of companies for a decade or more.
KK
Oct 30, 2020
AI libraries are the most valuable. They provide extensibility and usability. Spark has a lot of connectors, which is a very important and useful feature for AI. You need to connect a lot of points for AI, and you have to get data from those systems. Connectors are very wide in Spark. With a Spark cluster, you can get fast results, especially for AI.
Miodrag Milojevic - PeerSpot reviewer
Jul 25, 2023
It's easy to prepare parallelism in Spark, run the solution with specific parameters, and get good performance.
Ilya Afanasyev - PeerSpot reviewer
Aug 3, 2022
There's a lot of functionality.
RV
Jul 23, 2020
The memory processing engine is the solution's most valuable aspect. It processes everything extremely fast, and it's in the cluster itself. It acts as a memory engine and is very effective in processing data correctly.
reviewer1185906 - PeerSpot reviewer
Feb 22, 2022
One of the key features is that Apache Spark is a distributed computing framework. You can help multiple slaves and distribute the workload between them.
NK
Aug 1, 2022
Its scalability and speed are very valuable. You can scale it a lot. It is a great technology for big data. It is definitely better than a lot of earlier warehouse or pipeline solutions, such as Informatica. Spark SQL is very compliant with normal SQL that we have been using over the years. This makes it easy to code in Spark. It is just like using normal SQL. You can use the APIs of Spark or you can directly write SQL code and run it. This is something that I feel is useful in Spark.
 

Apache Spark Cons review quotes

VM
Feb 20, 2024
The setup I worked on was really complex.
reviewer2150616 - PeerSpot reviewer
Aug 5, 2024
From my perspective, the only thing that needs improvement is the interface, as it was not easily understandable.
reviewer879201 - PeerSpot reviewer
Dec 23, 2019
When you want to extract data from your HDFS and other sources then it is kind of tricky because you have to connect with those sources.
Learn what your peers think about Apache Spark. Get advice and tips from experienced pros sharing their opinions. Updated: January 2025.
831,265 professionals have used our research since 2012.
SurjitChoudhury - PeerSpot reviewer
Feb 20, 2024
There could be enhancements in optimization techniques, as there are some limitations in this area that could be addressed to further refine Spark's performance.
KK
Oct 30, 2020
Stream processing needs to be developed more in Spark. I have used Flink previously. Flink is better than Spark at stream processing.
Miodrag Milojevic - PeerSpot reviewer
Jul 25, 2023
If you have a Spark session in the background, sometimes it's very hard to kill these sessions because of D allocation.
Ilya Afanasyev - PeerSpot reviewer
Aug 3, 2022
I know there is always discussion about which language to write applications in and some people do love Scala. However, I don't like it.
RV
Jul 23, 2020
The graphical user interface (UI) could be a bit more clear. It's very hard to figure out the execution logs and understand how long it takes to send everything. If an execution is lost, it's not so easy to understand why or where it went. I have to manually drill down on the data processes which takes a lot of time. Maybe there could be like a metrics monitor, or maybe the whole log analysis could be improved to make it easier to understand and navigate.
reviewer1185906 - PeerSpot reviewer
Feb 22, 2022
When you are working with large, complex tasks, the garbage collection process is slow and affects performance.
NK
Aug 1, 2022
Its UI can be better. Maintaining the history server is a little cumbersome, and it should be improved. I had issues while looking at the historical tags, which sometimes created problems. You have to separately create a history server and run it. Such things can be made easier. Instead of separately installing the history server, it can be made a part of the whole setup so that whenever you set it up, it becomes available.