Try our new research platform with insights from 80,000+ expert users

Apache NiFi vs Apache Spark comparison

 

Comparison Buyer's Guide

Executive Summary

Review summaries and opinions

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Categories and Ranking

Apache NiFi
Ranking in Compute Service
8th
Average Rating
7.8
Number of Reviews
11
Ranking in other categories
No ranking in other categories
Apache Spark
Ranking in Compute Service
4th
Average Rating
8.4
Reviews Sentiment
7.7
Number of Reviews
64
Ranking in other categories
Hadoop (1st), Java Frameworks (2nd)
 

Mindshare comparison

As of January 2025, in the Compute Service category, the mindshare of Apache NiFi is 7.7%, up from 6.3% compared to the previous year. The mindshare of Apache Spark is 11.4%, up from 8.2% compared to the previous year. It is calculated based on PeerSpot user engagement data.
Compute Service
 

Featured Reviews

Arjun Pandey - PeerSpot reviewer
Good monitoring, metrics capabilities and provides ability to design processors with a single click
The good thing about Apache NiFi is that it has a concept called a flow file, and there's something called a flow file processor. The processor is the building block of your entire job. They have close to 500 processors for each purpose. For example, for reading from Kafka, Ni-Fi has a processor called "consumer Kafka". To write to S3, they have a processor called "put S3". Now, if I read from Kafka and write my own application, I'd need to ensure the library I'm using tracks my messages. I'd also need to handle any failures by rereading messages and ensuring acknowledgment. But all this complexity is already handled by Apache processor. They have around 500 processors, with a community investing significant effort into developing them. I can design your processor with a single click, export the entire workflow, and import it. The format is actionable, so NiFi is immediately set up. It's also distributed in nature so that I can scale it across nodes based on the workload. These nodes share their state. If one node goes down during processing, that data might be lost, but any subsequent data is safe. Such occurrences are rare. In essence, if you want a quick solution, Apache NiFi is a strong contender. There are other solutions like AirFlow and some paid pipeline options. AirFlow is open-source but can be complicated. For ETL or ERT solutions, there are pricier options. But if I need a pipeline that I can monitor step by step, Apache NiFi is a good choice. It integrates with Prometheus metrics, allowing me to embed them in my workflow. There's also a processor for integration with Slack, and I can receive notifications when the workflow is completed or fails. Another feature I appreciate is "back pressure," which NiFi handles automatically. It maintains its own queue and addresses back-pressure issues. If, for instance, an upstream entity isn't fast enough, items get stored in a queue, managed internally by NiFi's back pressure algorithm.
Ilya Afanasyev - PeerSpot reviewer
Reliable, able to expand, and handle large amounts of data well
We use batch processing. It works well with our formats and file versions. There's a lot of functionality. In our pipeline each hour, we make a copy of data from MongoDB, of the changes from MongoDB to some specific file. Each time pipeline copied all of the data, it would do it each time without changes to all of the tables. Tables have a lot of data, and in the last MongoDB version, there is a possibility to read only changed data. This reduced the cost and configuration of the cluster, and we saved about $150,000. The solution is scalable. It's a stable product.

Quotes from Members

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Pros

"We can integrate the tool with other applications easily."
"Apache NiFi is user-friendly. Its most valuable features for handling large volumes of data include its multitude of integrated endpoints and clients and the ability to create cron jobs to run tasks at regular intervals."
"It's an automated flow, where you can build a flow from source to destination, then do the transformation in between."
"Visually, this is a good product."
"The initial setup is very easy. I would rate my experience with the initial setup a ten out of ten, where one point is difficult, and ten points are easy."
"The user interface is good and makes it easy to design very popular workflows."
"The most valuable features of this solution are ease of use and implementation."
"The most valuable feature has been the range of clients and the range of connectors that we could use."
"The most valuable feature of Apache Spark is its memory processing because it processes data over RAM rather than disk, which is much more efficient and fast."
"The solution has been very stable."
"ETL and streaming capabilities."
"Spark can handle small to huge data and is suitable for any size of company."
"We use Spark to process data from different data sources."
"The main feature that we find valuable is that it is very fast."
"AI libraries are the most valuable. They provide extensibility and usability. Spark has a lot of connectors, which is a very important and useful feature for AI. You need to connect a lot of points for AI, and you have to get data from those systems. Connectors are very wide in Spark. With a Spark cluster, you can get fast results, especially for AI."
"Features include machine learning, real time streaming, and data processing."
 

Cons

"The tool should incorporate more tutorials for advanced use cases. It has tutorials for simple use cases."
"There is room for improvement in integration with SSO. For example, NiFi does not have any integration with SSO. And if I want to give some kind of rollback access control across the organization. That is not possible."
"There are some claims that NiFi is cloud-native but we have tested it, and it's not."
"The overall stability of this solution could be improved. In a future release, we would like to have access to more features that could be used in a parallel way. This would provide more freedom with processing."
"I think the UI interface needs to be more user-friendly."
"There should be a better way to integrate a development environment with local tools."
"More features must be added to the product."
"We run many jobs, and there are already large tables. When we do not control NiFi on time, all reports fail for the day. So it's pretty slow to control, and it has to be improved."
"Its UI can be better. Maintaining the history server is a little cumbersome, and it should be improved. I had issues while looking at the historical tags, which sometimes created problems. You have to separately create a history server and run it. Such things can be made easier. Instead of separately installing the history server, it can be made a part of the whole setup so that whenever you set it up, it becomes available."
"The initial setup was not easy."
"We've had problems using a Python process to try to access something in a large volume of data. It crashes if somebody gives me the wrong code because it cannot handle a large volume of data."
"It would be beneficial to enhance Spark's capabilities by incorporating models that utilize features not traditionally present in its framework."
"More ML based algorithms should be added to it, to make it algorithmic-rich for developers."
"Apache Spark could improve the connectors that it supports. There are a lot of open-source databases in the market. For example, cloud databases, such as Redshift, Snowflake, and Synapse. Apache Spark should have connectors present to connect to these databases. There are a lot of workarounds required to connect to those databases, but it should have inbuilt connectors."
"It's not easy to install."
"The main concern is the overhead of Java when distributed processing is not necessary."
 

Pricing and Cost Advice

"We use the free version of Apache NiFi."
"I used the tool's free version."
"The solution is open-source."
"It's an open-source solution."
"The tool is an open-source product. If you're using the open-source Apache Spark, no fees are involved at any time. Charges only come into play when using it with other services like Databricks."
"They provide an open-source license for the on-premise version."
"It is an open-source platform. We do not pay for its subscription."
"Since we are using the Apache Spark version, not the data bricks version, it is an Apache license version, the support and resolution of the bug are actually late or delayed. The Apache license is free."
"Apache Spark is open-source. You have to pay only when you use any bundled product, such as Cloudera."
"It is an open-source solution, it is free of charge."
"Apache Spark is not too cheap. You have to pay for hardware and Cloudera licenses. Of course, there is a solution with open source without Cloudera."
"Considering the product version used in my company, I feel that the tool is not costly since the product is available for free."
report
Use our free recommendation engine to learn which Compute Service solutions are best for your needs.
831,265 professionals have used our research since 2012.
 

Top Industries

By visitors reading reviews
Financial Services Firm
18%
Computer Software Company
14%
Manufacturing Company
9%
Retailer
7%
Financial Services Firm
27%
Computer Software Company
13%
Manufacturing Company
7%
University
5%
 

Company Size

By reviewers
Large Enterprise
Midsize Enterprise
Small Business
 

Questions from the Community

What needs improvement with Apache NiFi?
The tool should incorporate more tutorials for advanced use cases. It has tutorials for simple use cases.
What do you like most about Apache Spark?
We use Spark to process data from different data sources.
What is your experience regarding pricing and costs for Apache Spark?
Compared to other solutions like Doc DB, Spark is more costly due to the need for extensive infrastructure. It requires significant investment in infrastructure, which can be expensive. While cloud...
What needs improvement with Apache Spark?
The main concern is the overhead of Java when distributed processing is not necessary. In such cases, operations can often be done on one node, making Spark's distributed mode unnecessary. Conseque...
 

Comparisons

 

Learn More

 

Overview

 

Sample Customers

Macquarie Telecom Group, Dovestech, Slovak Telekom, Looker, Hastings Group
NASA JPL, UC Berkeley AMPLab, Amazon, eBay, Yahoo!, UC Santa Cruz, TripAdvisor, Taboola, Agile Lab, Art.com, Baidu, Alibaba Taobao, EURECOM, Hitachi Solutions
Find out what your peers are saying about Apache NiFi vs. Apache Spark and other solutions. Updated: January 2025.
831,265 professionals have used our research since 2012.