Try our new research platform with insights from 80,000+ expert users

Azure Data Factory vs StreamSets comparison

 

Comparison Buyer's Guide

Executive Summary
 

Categories and Ranking

Azure Data Factory
Ranking in Data Integration
1st
Average Rating
8.0
Reviews Sentiment
6.7
Number of Reviews
86
Ranking in other categories
Cloud Data Warehouse (3rd)
StreamSets
Ranking in Data Integration
9th
Average Rating
8.4
Reviews Sentiment
7.5
Number of Reviews
24
Ranking in other categories
No ranking in other categories
 

Mindshare comparison

As of November 2024, in the Data Integration category, the mindshare of Azure Data Factory is 11.1%, down from 13.3% compared to the previous year. The mindshare of StreamSets is 1.7%, up from 1.3% compared to the previous year. It is calculated based on PeerSpot user engagement data.
Data Integration
 

Featured Reviews

Thulani David Mngadi - PeerSpot reviewer
Data flow feature is valuable for data transformation tasks
The workflow automation features in GitLab, particularly its low code/no code approach, are highly beneficial for accelerating development speed. This feature allows for quick creation of pipelines and offers customization options for integration needs, making it versatile for various use cases. GitLab supports a wide range of connectors, catering to a majority of integration needs. Azure Data Factory's virtual enterprise and monitoring capabilities, the visual interface of GitLab makes it user-friendly and easy to teach, facilitating adoption within teams. While the monitoring capabilities are sufficient out of the box, they may not be as comprehensive as dedicated enterprise monitoring tools. GitLab's monitoring features are manageable for production use, with the option to integrate log analytics or create custom dashboards if needed. The data flow feature in Azure Data Factory within GitLab is valuable for data transformation tasks, especially for those who may not have expertise in writing complex code. It simplifies the process of data manipulation and is particularly useful for individuals unfamiliar with Spark coding. While there could be improvements for more flexibility, overall, the data flow feature effectively accomplishes its purpose within GitLab's ecosystem.
Reyansh Kumar - PeerSpot reviewer
We no longer need to hire highly skilled data engineers to create and monitor data pipelines
The things I like about StreamSets are its * overall user interface * efficiency * product features, which are all good. Also, the scheduling within the data engineering pipeline is very much appreciated, and it has a wide range of connectors for connecting to any data sources like SQL Server, AWS, Azure, etc. We have used it with Kafka, Hadoop, and Azure Data Factory Datasets. Connecting to these systems with StreamSets is very easy. You just need to configure the data sources, the paths and their configurations, and you are ready to go. It is very efficient and very easy to use for ETL pipelines. It is a GUI-based interface in which you can easily create or design your own data pipelines with just a few clicks. As for moving data into modern analytics systems, we are using it with Microsoft Power BI, AWS, and some on-premises solutions, and it is very easy to get data from StreamSets into them. No hardcore coding or special technical expertise is required. It is also a no-code platform in which you can configure your data sources and data output for easy configuration of your data pipeline. This is a very important aspect because if a tool requires code development, we need to hire software developers to get the task done. By using StreamSets, it can be done with a few clicks.

Quotes from Members

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Pros

"Data Flow and Databricks are going to be extremely valuable services, allowing data solutions to scale as the business grows and new data sources are added."
"UI is easy to navigate and I can retrieve VTL code without knowing in-depth coding languages."
"The most valuable features of the solution are its ease of use and the readily available adapters for connecting with various sources."
"When it comes to our business requirements, this solution has worked well for us. However, we have not stretched it to the limit."
"The overall performance is quite good."
"Powerful but easy-to-use and intuitive."
"I am one hundred percent happy with the stability."
"Data Factory's most valuable feature is Copy Activity."
"The most valuable features are the option of integration with a variety of protocols, languages, and origins."
"In StreamSets, everything is in one place."
"The UI is user-friendly, it doesn't require any technical know-how and we can navigate to social media or use it more easily."
"It is really easy to set up and the interface is easy to use."
"StreamSets’ data drift resilience has reduced the time it takes us to fix data drift breakages. For example, in our previous Hadoop scenario, when we were creating the Sqoop-based processes to move data from source to destinations, we were getting the job done. That took approximately an hour to an hour and a half when we did it with Hadoop. However, with the StreamSets, since it works on a data collector-based mechanism, it completes the same process in 15 minutes of time. Therefore, it has saved us around 45 minutes per data pipeline or table that we migrate. Thus, it reduced the data transfer, including the drift part, by 45 minutes."
"I really appreciate the numerous ready connectors available on both the source and target sides, the support for various media file formats, and the ease of configuring and managing pipelines centrally."
"StreamSets Transformer is a good feature because it helps you when you are developing applications and when you don't want to write a lot of code. That is the best feature overall."
"The most valuable would be the GUI platform that I saw. I first saw it at a special session that StreamSets provided towards the end of the summer. I saw the way you set it up and how you have different processes going on with your data. The design experience seemed to be pretty straightforward to me in terms of how you drag and drop these nodes and connect them with arrows."
 

Cons

"Some of the optimization techniques are not scalable."
"Currently, smaller businesses face a disadvantage in terms of pricing, and reducing costs could address this issue."
"There aren't many third-party extensions or plugins available in the solution."
"Data Factory's monitorability could be better."
"The setup and configuration process could be simplified."
"My only problem is the seamless connectivity with various other databases, for example, SAP."
"The solution needs to integrate more with other providers and should have a closer integration with Oracle BI."
"Sometimes I need to do some coding, and I'd like to avoid that. I'd like no-code integrations."
"In terms of the product, I don't think there is any room for improvement because it is very good. One small area of improvement that is very much needed is on the knowledge base side. Sometimes, it is not very clear how to set up a certain process or a certain node for a person who's using the platform for the first time."
"The logging mechanism could be improved. If I am working on a pipeline, then create a job out of it and it is running, it will generate constant logs. So, the logging mechanism could be simplified. Now, it is a bit difficult to understand and filter the logs. It takes some time."
"One area for improvement could be the cloud storage server speed, as we have faced some latency issues here and there."
"We've seen a couple of cases where it appears to have a memory leak or a similar problem."
"One thing that I would like to add is the ability to manually enter data. The way the solution currently works is we don't have the option to manually change the data at any point in time. Being able to do that will allow us to do everything that we want to do with our data. Sometimes, we need to manually manipulate the data to make it more accurate in case our prior bifurcation filters are not good. If we have the option to manually enter the data or make the exact iterations on the data set, that would be a good thing."
"The software is very good overall. Areas for improvement are the error logging and the version history. I would like to see better, more detailed error logging information."
"The documentation is inadequate and has room for improvement because the technical support does not regularly update their documentation or the knowledge base."
"I would like to see further improvement in the UI. In addition, upgrades are not automatic and they should be automated. Currently, we have to manually upgrade versions."
 

Pricing and Cost Advice

"Azure Data Factory gives better value for the price than other solutions such as Informatica."
"This is a cost-effective solution."
"The solution's fees are based on a pay-per-minute use plus the amount of data required to process."
"I rate the product price as six on a scale of one to ten, where one is low price and ten is high price."
"The solution's pricing is competitive."
"The price you pay is determined by how much you use it."
"The cost is based on the amount of data sets that we are ingesting."
"Our licensing fees are approximately 15,000 ($150 USD) per month."
"Its pricing is pretty much up to the mark. For smaller enterprises, it could be a big price to pay at the initial stage of operations, but the moment you have the Seed B or Seed C funding and you want to scale up your operations and aren't much worried about the funds, at that point in time, you would need a solution that could be scaled."
"It's not expensive because you pay per month, and the tasks you can perform with it are huge. It's reliable and cost-effective."
"It's not so favorable for small companies."
"The licensing is expensive, and there are other costs involved too. I know from using the software that you have to buy new features whenever there are new updates, which I don't really like. But initially, it was very good."
"I believe the pricing is not equitable."
"There are different versions of the product. One is the corporate license version, and the other one is the open-source or free version. I have been using the corporate license version, but they have recently launched a new open-source version so that anybody can create an account and use it. The licensing cost varies from customer to customer. I don't have a lot of input on that. It is taken care of by PMO, and they seem fine with its pricing model. It is being used enterprise-wide. They seem to have got a good deal for StreamSets."
"The overall cost is very flexible so it is not a burden for our organization... However, the cost should be improved. For small and mid-size organizations it might be a challenge."
"It has a CPU core-based licensing, which works for us and is quite good."
report
Use our free recommendation engine to learn which Data Integration solutions are best for your needs.
816,406 professionals have used our research since 2012.
 

Top Industries

By visitors reading reviews
Financial Services Firm
13%
Computer Software Company
12%
Manufacturing Company
9%
Healthcare Company
7%
Financial Services Firm
17%
Computer Software Company
13%
Manufacturing Company
8%
Insurance Company
6%
 

Company Size

By reviewers
Large Enterprise
Midsize Enterprise
Small Business
 

Questions from the Community

How do you select the right cloud ETL tool?
AWS Glue and Azure Data factory for ELT best performance cloud services.
How does Azure Data Factory compare with Informatica PowerCenter?
Azure Data Factory is flexible, modular, and works well. In terms of cost, it is not too pricey. It offers the stability and reliability I am looking for, good scalability, and is easy to set up an...
How does Azure Data Factory compare with Informatica Cloud Data Integration?
Azure Data Factory is a solid product offering many transformation functions; It has pre-load and post-load transformations, allowing users to apply transformations either in code by using Power Q...
What do you like most about StreamSets?
The best thing about StreamSets is its plugins, which are very useful and work well with almost every data source. It's also easy to use, especially if you're comfortable with SQL. You can customiz...
What needs improvement with StreamSets?
We often faced problems, especially with SAP ERP. We struggled because many columns weren't integers or primary keys, which StreamSets couldn't handle. We had to restructure our data tables, which ...
What is your primary use case for StreamSets?
StreamSets is used for data transformation rather than ETL processes. It focuses on transforming data directly from sources without handling the extraction part of the process. The transformed data...
 

Learn More

Video not available
 

Overview

 

Sample Customers

1. Adobe 2. BMW 3. Coca-Cola 4. General Electric 5. Johnson & Johnson 6. LinkedIn 7. Mastercard 8. Nestle 9. Pfizer 10. Samsung 11. Siemens 12. Toyota 13. Unilever 14. Verizon 15. Walmart 16. Accenture 17. American Express 18. AT&T 19. Bank of America 20. Cisco 21. Deloitte 22. ExxonMobil 23. Ford 24. General Motors 25. IBM 26. JPMorgan Chase 27. Microsoft (Azure Data Factory is developed by Microsoft) 28. Oracle 29. Procter & Gamble 30. Salesforce 31. Shell 32. Visa
Availity, BT Group, Humana, Deluxe, GSK, RingCentral, IBM, Shell, SamTrans, State of Ohio, TalentFulfilled, TechBridge
Find out what your peers are saying about Azure Data Factory vs. StreamSets and other solutions. Updated: October 2024.
816,406 professionals have used our research since 2012.