We use the whole Data Collector application.
Senior Network Administrator at a energy/utilities company with 201-500 employees
Helped us break down data silos and produce better, up-to-date reports, as well as save money
Pros and Cons
- "The most valuable feature is the pipelines because they enable us to pull in and push out data from different sources and to manipulate and clean things up within them."
- "The design experience is the bane of our existence because their documentation is not the best. Even when they update their software, they don't publish the best information on how to update and change your pipeline configuration to make it conform to current best practices. We don't pay for the added support. We use the "freeware version." The user community, as well as the documentation they provide for the standard user, are difficult, at best."
What is our primary use case?
How has it helped my organization?
We now consume many more hundreds of terabytes of data than we used to before we had StreamSets. It has definitely enabled us to do things a lot faster, and be a lot more agile, with a lot more data consumption and a lot more reporting.
Another benefit is that it has helped us to break down data silos. We now consume data across different silos and then we aggregate it together so that we can do reporting that is not just for that one silo of people but for a number of different people across the entire organization. That has had a positive effect, enabling us to save money, spend money more effectively, and have more up-to-date data in reports, as well as in auditing. Our safety processes are better too.
One way we have saved money is thanks to how the solution streamlines the data that we pull in, data that we weren't pulling in before.
StreamSets allows more people to know what's going on. It helps us with better allocation of resources, better allocation of staff, and right-sizing. We're in oil and gas and, in our case, it allows us to optimize what we're pulling out of the ground and then what we're selling.
It has helped to scale our data operations and as a result, in addition to saving money and right-sizing, it's helped our field operations and provided us with more management reporting.
Also, the data drift resilience reduces the time it takes to fix data drift breakages.
What is most valuable?
The most valuable feature is the pipelines because they enable us to pull in and push out data from different sources and to manipulate and clean things up within them.
We use StreamSets to connect to enterprise data stores, including OLTP databases and Hadoop. Connecting to them is pretty easy. It's the data manipulation and the data streaming that are the harder parts behind that, just because of the way the tool is written.
What needs improvement?
The design experience is the bane of our existence because their documentation is not the best. Even when they update their software, they don't publish the best information on how to update and change your pipeline configuration to make it conform to current best practices.
We don't pay for the added support. We use the "freeware version." The user community, as well as the documentation they provide for the standard user, are difficult, at best.
However, we have a couple of people in-house here who are experts in data analysis and they have figured out how to use this tool. We have to have people who are extremely skilled to go in and write the pipelines for this software because it's so complicated. The software works great for us, but there is an extremely steep learning curve because they don't provide a lot of information outside of paying their ridiculous support costs. Their support starts at $50,000 a year and up.
Also, the built-in data drift resilience for ETL operations requires a bunch of custom code development to be able to handle that. It's somewhat difficult because you have to customize it a fair amount.
I also would like a more user-friendly interface and better error-trap handling.
Buyer's Guide
StreamSets
March 2025

Learn what your peers think about StreamSets. Get advice and tips from experienced pros sharing their opinions. Updated: March 2025.
842,767 professionals have used our research since 2012.
For how long have I used the solution?
We have been using StreamSets for about four years.
What do I think about the stability of the solution?
We just patched ourselves up to the latest release about a month ago, so it's actually pretty stable at this point. It used to be quite buggy, going back over the last little while, but it's pretty stable now.
What do I think about the scalability of the solution?
This software is very scalable.
Which solution did I use previously and why did I switch?
We did not have a previous solution.
How was the initial setup?
The initial setup was somewhere between straightforward and complex. It was pretty straightforward to start with, but then it started ramping up to be more difficult as we wanted to add more stuff in.
The difficulty depends upon your data sources. If you have just one data source and you want to consume a lot of different types of data from that one source, it's pretty straightforward. But when you have 20 or 25 different data sources, and you need to pipeline all that data into a couple of data warehouses so that you can use advanced data analytics software to do reporting, analysis, and notifications, it's a lot more complicated. With every data source, it becomes exponentially more complicated to manage.
We spent a significant amount of time doing it, but otherwise, it was seamless because it was our own staff. We didn't have to worry about trying to find money or resource time or do any of the prep work needed to get external resources.
Ours is a single deployment, but it is used across our entire staff base of 200-plus people. We need three people for deployment and maintenance, whose responsibilities include software management, application management, and data analysis and management.
What was our ROI?
The ROI we have seen is in savings of time and money.
What's my experience with pricing, setup cost, and licensing?
We use the free version. It's great for a public, free release. Our stance is that the paid support model is too expensive to get into. They should honestly reevaluate that.
We tried to go and get them to look at their licensing and support model and they said they were not interested in reevaluating that in any way.
Which other solutions did I evaluate?
We tried to use another freeware ETL tool. It's fairly well-known. We ran it for a couple of months but it was going to be even more difficult than StreamSets, so we chose that in the end.
What other advice do I have?
The ease of using StreamSet to move data into modern analytics platforms, on a scale of one to 10, is about a five.
The solution enables you to build data pipelines without knowing how to code if it's the latest, state-of-the-art cloud connecting stuff. If it's for anything structured for Oracle and SQL Server and other data sources, it's difficult. Without knowing how to write code, some of it's easy and some of it is not.
My advice to someone who is considering this software is to be very aware that their integrator and data analysis people will need a very specific skill set.
Which deployment model are you using for this solution?
On-premises
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.

Software Developer at Appnomu Business Services
Simplifies the way we perform tasks and engineer pipelines at all stages
Pros and Cons
- "StreamSets Transformer is a good feature because it helps you when you are developing applications and when you don't want to write a lot of code. That is the best feature overall."
- "The monitoring visualization is not that user-friendly. It should include other features to visualize things, like how many records were streamed from a source to a destination on a particular date."
What is our primary use case?
It is primarily being used by our IT department to configure things and see what is missing and what the issues are.
How has it helped my organization?
I'm using StreamSets to find issues with our software and it is helping us to do so, and to make sure that we are able to debug on time. It makes things much simpler. We can use the solution to know what issue is happening at the moment. We are able to easily identify a leak and resolve it on time.
It reduces our workload by about 30 percent. And it saves us a lot on having to hire expensive technical experts or software engineers. You purchase a package with a reasonable pricing model, and then you can use it with your team. It saves us from hiring a technical person to carry out the tasks. With StreamSets, you can do a task easily.
It also makes it easy to send data from one place to another.
StreamSets is doing a lot in our IT operations because it is simplifying the way we perform tasks and the way we engineer pipelines at all stages, including the sources, processes, and destination use. We can schedule data pipelines and that's easy.
And because it is low-code software, you don't need to develop the code and that really saves a lot of time. Using the canvas to create and engineer data pipelines is very easy. StreamSets saves me three hours that it would take me to manually do a task.
What is most valuable?
StreamSets Transformer is a good feature because it helps you when you are developing applications and when you don't want to write a lot of code. That is the best feature overall. They really help you to come up with a solution more quickly. The Transformer logic is very easy, as long as you understand the concept of what you intend to develop. It doesn't require any technical skills.
The overall GUI and user interface are also good because you don't need to write complex programming for any implementation. You just drag and configure what you want to implement. It's very easy and you can use it without knowing any programming language.
The design experience is much easier when you want to integrate other systems and tools and make them work in a particular format. It helps you improve the topologies. You can view the status of all the pipelines you have developed and monitor them.
Connecting to enterprise data stores is also very easy, as is monitoring and managing things in one place.
What needs improvement?
The monitoring visualization is not that user-friendly. It should include other features to visualize things, like how many records were streamed from a source to a destination on a particular date.
I would also like better, detailed logging of error information.
It also needs a fragment drill-down feature when monitoring a data flow. That needs a lot of improvement, especially when you are running a job.
For how long have I used the solution?
This is my second year using StreamSets.
What do I think about the stability of the solution?
It's stable.
What do I think about the scalability of the solution?
It is a scalable solution for any company that needs to know about its data processing.
How are customer service and support?
It is hard to get technical support from the company. To receive one-on-one communication requires a budget, which we don't really have. The way we get technical support is through the documentation and knowledge base.
It is missing a live instant chat on the dashboard.
How would you rate customer service and support?
Neutral
Which solution did I use previously and why did I switch?
We did not have a previous solution.
How was the initial setup?
Initially, the deployment could be very hard if you do not have a lot of technical skills, but as you get used to the software, within a day, the deployment becomes straightforward and becomes easy. It took two weeks to have everything configured in the right manner. I worked with one other colleague to set everything up.
It is hard, especially when you are a beginner, but when you read the documentation you can set things up quickly. The documentation helps out if you don't have good knowledge of the solution.
It doesn't require maintenance.
What was our ROI?
The solution is helping a lot because we are not spending a lot of money on a technical team. We just subscribe to the software and we're able to configure things. It has helped us save on resources by 30 percent.
What's my experience with pricing, setup cost, and licensing?
The pricing is too fixed. It should be based on how much data you need to process. Some businesses are not so big that they process a lot of data. They process a lot of debugging. The pricing is not so favorable for a small enterprise because it is too limited.
What other advice do I have?
I would recommend the software to any business that needs to do data engineering. If they design data pipelines, it is really a great idea to test out StreamSets. Unfortunately, you need a good budget for it. If a small business doesn't have the budget, I cannot recommend it. But if they have a good budget, I really recommend it because it has so many features that can really help data scientists and analysts generate patterns or insights for their businesses. And it will benefit their customers as well.
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
Buyer's Guide
StreamSets
March 2025

Learn what your peers think about StreamSets. Get advice and tips from experienced pros sharing their opinions. Updated: March 2025.
842,767 professionals have used our research since 2012.
Senior Technical Manager at a financial services firm with 501-1,000 employees
The ease of configuration for pipes is amazing, and the GUI is very nice
Pros and Cons
- "The Ease of configuration for pipes is amazing. It has a lot of connectors. Mainly, we can do everything with the data in the pipe. I really like the graphical interface too"
- "I would like to see it integrate with other kinds of platforms, other than Java. We're going to have a lot of applications using .NET and other languages or frameworks. StreamSets is very helpful for the old Java platform but it's hard to integrate with the other platforms and frameworks."
- "StreamSet works great for batch processing but we are looking for something that is more real-time. We need latency in numbers below milliseconds."
What is our primary use case?
It performs very well. The main use is to extract information from some of our Kafka topics and put it in our internal systems, flat files, and integration with Java.
How has it helped my organization?
It facilitates the consumption of the data in batch mode to the system where it is required. We don't do a lot of transformations or joining or forking of the information. It's more point-to-point connectivity that we implement over StreamSets.
What is most valuable?
The ease of configuration for pipes is amazing. It has a lot of connectors. Mainly, we can do everything with the data in the pipe. I really like the graphical interface too. It's pretty nice.
What needs improvement?
I would like to see it integrate with other kinds of platforms, other than Java. We're going to have a lot of applications using .NET and other languages or frameworks. StreamSets is very helpful for the old Java platform but it's hard to integrate with the other platforms and frameworks.
StreamSets works great for batch processing but we are looking for something that is more real-time. We need latency in numbers below milliseconds.
For how long have I used the solution?
One to three years.
What do I think about the stability of the solution?
It's pretty stable. StreamSets has been up and running up for months without any intervention in terms of the operations team. It's great.
I don't know if they can implement some kind of high-availability. I really don't go deep into that kind of configuration because, with only one node and running as stably as it is, we have no problem with that. But for critical operations, I'd like to know if I can facilitate some kind of high-availability, in case one of the nodes go down.
What do I think about the scalability of the solution?
It's pretty scalable.
How is customer service and technical support?
I don't use support. I mainly use the community or web searches; self-learning.
How was the initial setup?
The initial setup is pretty straightforward.
What other advice do I have?
If you are looking for something to do batch processing in Java, this is the right solution. We did the exploration when we were trying to implement a batch processing system and decided that StreamSets is the best for that. If you're looking for real-time, you may want to look at another system or the next version of this one.
Because of the kind of system that we need to implement with this kind of solution, the most important factors I look at when selecting a vendor are things like latency and real-time processing.
I would rate it at nine out of 10. What would make it a 10 would be, as I said, I'd like to have more integration with other kinds of languages or frameworks and also more real-time processing, not batch.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Data Engineer at a energy/utilities company with 10,001+ employees
Easy to set up and use, and the functionality for transforming data is good
Pros and Cons
- "It is really easy to set up and the interface is easy to use."
- "We've seen a couple of cases where it appears to have a memory leak or a similar problem."
What is our primary use case?
We typically use it to transport our Oracle raw datasets up to Microsoft Azure, and then into SQL databases there.
What is most valuable?
It is really easy to set up and the interface is easy to use.
We found it pretty easy to transform data.
The online documentation is pretty good.
What needs improvement?
We've seen a couple of cases where it appears to have a memory leak or a similar problem. It grows for a bit and then we'd have to restart the container, maybe once a month when it gets high.
For how long have I used the solution?
We have been using StreamSets for about one year. We may have been experimenting with it slightly before that time.
What do I think about the stability of the solution?
Other than the memory issue that we occasionally see, the stability has been really good.
What do I think about the scalability of the solution?
We haven't seen a problem with scaling it.
How are customer service and technical support?
I haven't had to deal with technical support. We would first check the online documentation or web documentation, and usually found what we needed. We haven't had to call them.
Which solution did I use previously and why did I switch?
Prior to using StreamSets, we were using Microsoft CDC (Change Data Capture). It was a fairly old product and there were lots of workaround and lots of issues that we had with it. We were looking for something more user-friendly. It was pretty stable, so that was not an issue.
How was the initial setup?
This product was a lot easier to use than the one we had before it. It took us half an hour and we were set up and running it, the first time.
What's my experience with pricing, setup cost, and licensing?
We are running the community version right now, which can be used free of charge. We were debating whether to move it to the commercial version, but we haven't had the need to, just yet.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.

Buyer's Guide
Download our free StreamSets Report and get advice and tips from experienced pros
sharing their opinions.
Updated: March 2025
Product Categories
Data IntegrationPopular Comparisons
Informatica Intelligent Data Management Cloud (IDMC)
Azure Data Factory
Informatica PowerCenter
Oracle Data Integrator (ODI)
Talend Open Studio
IBM InfoSphere DataStage
Palantir Foundry
Oracle GoldenGate
SAP Data Services
Qlik Replicate
Alteryx Designer
Fivetran
SnapLogic
Spring Cloud Data Flow
Buyer's Guide
Download our free StreamSets Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:
- How does Matillion ETL compare to StreamSets?
- When evaluating Data Integration, what aspect do you think is the most important to look for?
- Microsoft SSIS vs. Informatica PowerCenter - which solution has better features?
- What are the best on-prem ETL tools?
- Which integration solution is best for a company that wants to integrate systems between sales, marketing, and project development operations systems?
- Experiences with Oracle GoldenGate vs. Oracle Data Integrator?
- Should we choose Data Hub or GoldenGate?
- What are the must-have features for a Data integration system?
- Is there a bulletproof KPI Data Manager for SME?
- A recent review wrote that PowerCenter has room for improvement. Agree or Disagree?