Apache Spark pros and cons

Vendor: Apache

4.2 out of 5

65 reviews
90% willing to recommend

1,181 followers

Start review

Apache Spark Pros review quotes

Vineeth Marar

Cloud solution architect at 0

Feb 20, 2024

The solution is scalable.

Read full review

reviewer2150616

Lead Data Scientist at a transportation company with 51-200 employees

Aug 5, 2024

The product's initial setup phase was easy.

Read full review

reviewer879201

Technical Consultant at a tech services company with 1-10 employees

Dec 23, 2019

I feel the streaming is its best feature.

Free Report: Apache Spark Reviews and More

Learn what your peers think about Apache Spark. Get advice and tips from experienced pros sharing their opinions. Updated: April 2025.

DOWNLOAD NOW

847,646 professionals have used our research since 2012.

SurjitChoudhury

Data engineer at Cocos pt

Feb 20, 2024

Now, when we're tackling sentiment analysis using NLP technologies, we deal with unstructured data—customer chats, feedback on promotions or demos, and even media like images, audio, and video files. For processing such data, we rely on PySpark. Beneath the surface, Spark functions as a compute engine with in-memory processing capabilities, enhancing performance through features like broadcasting and caching. It's become a crucial tool, widely adopted by 90% of companies for a decade or more.

Read full review

Kürşat Kurt

Software Architect at Akbank

Oct 30, 2020

AI libraries are the most valuable. They provide extensibility and usability. Spark has a lot of connectors, which is a very important and useful feature for AI. You need to connect a lot of points for AI, and you have to get data from those systems. Connectors are very wide in Spark. With a Spark cluster, you can get fast results, especially for AI.

Read full review

Miodrag Milojevic

Senior Data Archirect at Yettel

Jul 25, 2023

It's easy to prepare parallelism in Spark, run the solution with specific parameters, and get good performance.

Read full review

Ilya Afanasyev

Senior Software Development Engineer at Yahoo!

Aug 3, 2022

There's a lot of functionality.

Read full review

Rajendran Veerappan

Director at Nihil Solutions

Jul 23, 2020

The memory processing engine is the solution's most valuable aspect. It processes everything extremely fast, and it's in the cluster itself. It acts as a memory engine and is very effective in processing data correctly.

Read full review

reviewer1185906

Manager - Data Science Competency at a tech services company with 201-500 employees

Feb 22, 2022

One of the key features is that Apache Spark is a distributed computing framework. You can help multiple slaves and distribute the workload between them.

Read full review

NitinKumar

Director of Enginnering at Sigmoid

Aug 1, 2022

Its scalability and speed are very valuable. You can scale it a lot. It is a great technology for big data. It is definitely better than a lot of earlier warehouse or pipeline solutions, such as Informatica. Spark SQL is very compliant with normal SQL that we have been using over the years. This makes it easy to code in Spark. It is just like using normal SQL. You can use the APIs of Spark or you can directly write SQL code and run it. This is something that I feel is useful in Spark.

Read full review

Show 10 more reviews (out of 54)

Apache Spark Cons review quotes

Vineeth Marar

Cloud solution architect at 0

Feb 20, 2024

The setup I worked on was really complex.

Read full review

reviewer2150616

Lead Data Scientist at a transportation company with 51-200 employees

Aug 5, 2024

From my perspective, the only thing that needs improvement is the interface, as it was not easily understandable.

Read full review

reviewer879201

Technical Consultant at a tech services company with 1-10 employees

Dec 23, 2019

When you want to extract data from your HDFS and other sources then it is kind of tricky because you have to connect with those sources.

Free Report: Apache Spark Reviews and More

Learn what your peers think about Apache Spark. Get advice and tips from experienced pros sharing their opinions. Updated: April 2025.

DOWNLOAD NOW

847,646 professionals have used our research since 2012.

SurjitChoudhury

Data engineer at Cocos pt

Feb 20, 2024

There could be enhancements in optimization techniques, as there are some limitations in this area that could be addressed to further refine Spark's performance.

Read full review

Kürşat Kurt

Software Architect at Akbank

Oct 30, 2020

Stream processing needs to be developed more in Spark. I have used Flink previously. Flink is better than Spark at stream processing.

Read full review

Miodrag Milojevic

Senior Data Archirect at Yettel

Jul 25, 2023

If you have a Spark session in the background, sometimes it's very hard to kill these sessions because of D allocation.

Read full review

Ilya Afanasyev

Senior Software Development Engineer at Yahoo!

Aug 3, 2022

I know there is always discussion about which language to write applications in and some people do love Scala. However, I don't like it.

Read full review

Rajendran Veerappan

Director at Nihil Solutions

Jul 23, 2020

The graphical user interface (UI) could be a bit more clear. It's very hard to figure out the execution logs and understand how long it takes to send everything. If an execution is lost, it's not so easy to understand why or where it went. I have to manually drill down on the data processes which takes a lot of time. Maybe there could be like a metrics monitor, or maybe the whole log analysis could be improved to make it easier to understand and navigate.

Read full review

reviewer1185906

Manager - Data Science Competency at a tech services company with 201-500 employees

Feb 22, 2022

When you are working with large, complex tasks, the garbage collection process is slow and affects performance.

Read full review

NitinKumar

Director of Enginnering at Sigmoid

Aug 1, 2022

Its UI can be better. Maintaining the history server is a little cumbersome, and it should be improved. I had issues while looking at the historical tags, which sometimes created problems. You have to separately create a history server and run it. Such things can be made easier. Instead of separately installing the history server, it can be made a part of the whole setup so that whenever you set it up, it becomes available.

Read full review

Show 10 more reviews (out of 54)