The main use case of StreamSets is to work on data integration and ingesting data for DataOps and modern analytics. We also use it for integrating data files from multiple sources. We use it to build, monitor, and manage smart, continuous data pipelines.
Senior Software Developer at a tech vendor with 10,001+ employees
Eradicated our data silos, integrating all data files into one central system
Pros and Cons
- "The ETL capabilities are very useful for us. We extract and transform data from multiple data sources, into a single, consistent data store, and then we put it in our systems. We typically use it to connect our Apache Kafka with data lakes. That process is smooth and saves us a lot of time in our production systems."
- "The software is very good overall. Areas for improvement are the error logging and the version history. I would like to see better, more detailed error logging information."
What is our primary use case?
How has it helped my organization?
The introduction of StreamSets in our organization has improved things in a significant way. The efficiency of our entire process has increased a lot and we derive high value from it. The integration of data files from multiple sources is what makes it great software for us.
The transfer of information between our teams is very smooth and efficient as well. It saves us time in transferring, collating, and integrating all of the data.
The integration part has been customized for our particular systems. Previously, we had different data silos. Now, with the introduction of StreamSets, the data silo approach has been eradicated. It has integrated all the data files into one software system, creating a central point for it.
And it has reduced our workload by 50 to 60 percent and that has definitely saved us some money on human resources.
What is most valuable?
There are two features that are most valuable for us. One is the Control Hub and the other is the Data Collector. With Data Collector, data migration has become much easier for us.
Also, the ETL capabilities are very useful for us. We extract and transform data from multiple data sources, into a single, consistent data store, and then we put it in our systems. We typically use it to connect our Apache Kafka with data lakes. That process is smooth and saves us a lot of time in our production systems.
We use the platform to incorporate modern analytics as well. That is one of our main use cases. It integrates well with our requirements. It is quite easy to move data into these analytics platforms using StreamSets because there are minimal coding requirements. The built-in applications and systems allow us to do it with ease. A first-time user could easily do it.
If there were coding requirements, it would take three or four extra resources to get things done. That aspect is very important for us. It saves us money by not needing coding manpower.
In addition, the system's data drift resilience is very effective and efficient. On our particular team, it has reduced the time it takes to fix data drift breakages by 10 to 12 man-hours per week.
What needs improvement?
The software is very good overall. Areas for improvement are the error logging and the version history. I would like to see better, more detailed error logging information. Apart from that, I don't think much improvement is required, because the software and features are very good.
Buyer's Guide
StreamSets
February 2025

Learn what your peers think about StreamSets. Get advice and tips from experienced pros sharing their opinions. Updated: February 2025.
838,713 professionals have used our research since 2012.
For how long have I used the solution?
I have been using StreamSets for the past year.
What do I think about the stability of the solution?
The software is very stable. The stability is a solid 10 out of 10.
What do I think about the scalability of the solution?
It's definitely scalable. We started with around 10 to 12 users, and now it has reached 35 to 40 users in our particular organization. We are now using it across four to five teams.
There are a lot of other teams in our company that are trying out the free version of the software. If it's suitable for them, they will obviously go for it as well.
How are customer service and support?
Through email, they have been very good at supporting us and they're very knowledgeable as well. They are going to various lengths to provide us with clear-cut answers.
How would you rate customer service and support?
Positive
Which solution did I use previously and why did I switch?
We didn't use any other similar software.
What was our ROI?
It took three to four months to assess the efficiency improvements in our team. There's definitely a return on investment from the use of StreamSets. Our efficiency has been increased by 20 to 25 percent and it has helped increase revenue by 7 to 10 percent.
What's my experience with pricing, setup cost, and licensing?
I imagine the pricing is moderate because our company is renewing its license, but I'm not sure about the exact price. There are no hidden costs that I have come across.
What other advice do I have?
It's cloud-based software, so there are only minimal maintenance requirements. Our IT team takes care of the maintenance of the software, but I don't think much time is required for that. Only regular updates need to be done. It is a minimal task that can be done by one or two personnel.
Overall, it provides us a lot with efficiency and increases the effectiveness of our transformation of data sets. The value and increase in revenue it has helped us achieve make it a very good software package.
Try the free version and, if the software meets your requirements, I would definitely say get the Enterprise version. It's pretty easy to understand and it generates a great deal of smoothness for your business processes. It's a must-have for every business to improve its efficiency and effectiveness.
The major takeaway for me has to be the improvement in the efficiency of our entire process. That stands out for us. StreamSets is a great platform. And the best thing about it is that there are minimal coding requirements. Any person, even someone with a non-technical background, can easily get accustomed to the software and start using it.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.

Senior Data Platform Manager at a manufacturing company with 10,001+ employees
Useful for data transformation and helps with column encryption
Pros and Cons
- "The best thing about StreamSets is its plugins, which are very useful and work well with almost every data source. It's also easy to use, especially if you're comfortable with SQL. You can customize it to do what you need. Many other tools have started to use features similar to those introduced by StreamSets, like automated workflows that are easy to set up."
- "We often faced problems, especially with SAP ERP. We struggled because many columns weren't integers or primary keys, which StreamSets couldn't handle. We had to restructure our data tables, which was painful. Also, pipeline failures were common, and data drifting wasn't addressed, which made things worse. Licensing was another issue we encountered."
What is our primary use case?
StreamSets is used for data transformation rather than ETL processes. It focuses on transforming data directly from sources without handling the extraction part of the process. The transformed data is loaded into Amazon Redshift or other data warehousing solutions.
What is most valuable?
The best thing about StreamSets is its plugins, which are very useful and work well with almost every data source. It's also easy to use, especially if you're comfortable with SQL. You can customize it to do what you need. Many other tools have started to use features similar to those introduced by StreamSets, like automated workflows that are easy to set up.
What needs improvement?
We often faced problems, especially with SAP ERP. We struggled because many columns weren't integers or primary keys, which StreamSets couldn't handle. We had to restructure our data tables, which was painful. Also, pipeline failures were common, and data drifting wasn't addressed, which made things worse. Licensing was another issue we encountered.
For how long have I used the solution?
I have been working with the product for five years.
What do I think about the scalability of the solution?
The tool's flexibility and performance are good. It allows for task dependency management so others won't be affected if one task fails. It can handle large volumes of data and supports features like change data capture for tracking changes.
Around six months ago, many people in my company were using StreamSets. In the US team, about 42 people across different projects were using it. Similarly, in 2021, there were around 43 users. About 16-18 people in Mumbai used it in my previous company.
How are customer service and support?
The tool's support is good.
How was the initial setup?
Installing StreamSets can take time because it has two versions: a data controller and a data transformer. The data controller is easier to install, but the transformer is more complicated and requires more steps, like setting up tasks and configurations.
It would be best to ensure the environment was ready, including that it worked well with other servers. The process can be both easy and difficult, but if you follow the documentation, it should be manageable.
What was our ROI?
Whether the tool is worth the money depends on the situation. If you don't want to spend a lot on competing products like Databricks or Glue, then StreamSets might be a better option. It's particularly valuable if you prefer not to invest heavily in training your team on new technologies. If your ETL developers or data engineers are comfortable with StreamSets, it can be worth the money.
What's my experience with pricing, setup cost, and licensing?
The licensing is expensive, and there are other costs involved too. I know from using the software that you have to buy new features whenever there are new updates, which I don't really like. But initially, it was very good.
What other advice do I have?
We use various tools and alerting systems to notify us of pipeline errors or failures. StreamSets supports data governance and compliance by allowing us to encrypt incoming data based on specified rules. We can easily encrypt columns by providing the column name and hash key.
If you're considering using StreamSets for the first time, I would advise first understanding why you want to use it and how it will benefit you. If you're dealing with change tracking or handling large amounts of data, it could be cost-effective compared to services like Amazon. It's easy to schedule and manage tasks with the tool, and you can enhance your skills as an ETL developer. You can easily migrate traditional pipelines built on platforms like Informatica or Talend to StreamSets. I rate the overall solution an eight out of ten.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
StreamSets
February 2025

Learn what your peers think about StreamSets. Get advice and tips from experienced pros sharing their opinions. Updated: February 2025.
838,713 professionals have used our research since 2012.
Product Marketing Manager at a tech vendor with 10,001+ employees
We are now able to run pipelines that scale horizontally, improving efficiency and significantly reducing workload
Pros and Cons
- "For me, the most valuable features in StreamSets have to be the Data Collector and Control Hub, but especially the Data Collector. That feature is very elegant and seamlessly works with numerous source systems."
- "Also, the intuitive canvas for designing all the streams in the pipeline, along with the simplicity of the entire product are very big pluses for me. The software is very simple and straightforward. That is something that is needed right now."
- "In terms of the product, I don't think there is any room for improvement because it is very good. One small area of improvement that is very much needed is on the knowledge base side. Sometimes, it is not very clear how to set up a certain process or a certain node for a person who's using the platform for the first time."
What is our primary use case?
My primary use case with StreamSets is to integrate large data sets from multiple sources into a destination. We also use it as a platform to ingest data and deliver data for database analytics.
How has it helped my organization?
One major benefit that we have realized with StreamSets is that we are now able to run pipelines that scale horizontally, instead of using a static service to host the service. This has improved efficiency and reduced our workload by around 85 percent. Initially, we started out with around 40 users. Now, there are 100 users. We have definitely scaled up, in terms of usage, with StreamSets.
The fact that it is a single centralized platform saves us a lot of time. It's very intuitive and very effective, saving us a lot of resources with its built-in capabilities. No manual intervention is needed, and nobody needs to oversee it. It's an "all-in-one" deal for us. We are able to save 15 to 18 hours per week. Tasks that required three people can be done with StreamSets itself.
And with its ability to integrate large data sets, we are now able to pull thousands of records instantly, thereby reducing the need to do some complex coding for this asset. That has also been a very big plus for us.
We also use it to connect our Apache Kafka with data lakes and, as a result, this connection has gotten much more efficient and quicker for us. The overall efficiency has also drastically improved for us with this. Connecting these enterprise systems using StreamSets is pretty easy. The StreamSets platform is very straightforward. There is no major coding required, so any non-technical person can also do it.
Without the need for any complex coding at all, we are able to pull records. The records are vast and very large and pulling them usually requires coding, but the fact that there is literally no coding required is a very big plus for us. Once you start to code, there is a lot of time involved and a lot of QA involved, but all of that is eliminated here.
And it has definitely helped us break down data silos. With our large amount of data, we have different data formats, and as a result, there are data silos that are present by default. With StreamSets, we were able to completely eliminate that because StreamSets has become a centralized system for us to accommodate everything. We have been able to get a single, centralized view of all our data.
We have a lot of different data formats, and transforming them manually without any tool or system is a cumbersome and frustrating process. We use StreamSets to do that. It has made that process much more elegant and efficient for us.
What is most valuable?
For me, the most valuable features in StreamSets have to be the Data Collector and Control Hub, but especially the Data Collector. That feature is very elegant and seamlessly works with numerous source systems.
Also, the intuitive canvas for designing all the streams in the pipeline, along with the simplicity of the entire product are very big pluses for me. The software is very simple and straightforward. That is something that is needed right now.
Apart from that, the user interface of StreamSets is very good. It's very user-friendly and very appealing. Moving data into modern analytics platforms is a very straightforward procedure. There is no difficulty involved in it.
In addition, the ETL capabilities of StreamSets are also very useful for us. We are able to extract and transform data from multiple data sources into a single, consistent data store that is loaded into our target system.
What needs improvement?
In terms of the product, I don't think there is any room for improvement because it is very good. One small area of improvement that is very much needed is on the knowledge base side. Sometimes, it is not very clear how to set up a certain process or a certain node for a person who's using the platform for the first time.
Some visual explanation or some visually appealing knowledge-based content would be very good. That is something that I could have done with, once I started using it, because I found it very difficult.
For how long have I used the solution?
I have been using StreamSets for about a year.
What do I think about the stability of the solution?
It is definitely a stable product. In fact, it is one of the top products in the market in that particular category. We have not faced any stability issues so far, in terms of server speed, latency, or deployment.
What do I think about the scalability of the solution?
It's a scalable product. In our company, the platform is used across seven teams in our organization.
A couple of more teams are evaluating StreamSets in our organization. They're running things and asking for some feedback from our side as well. There are plans to expand our use of it.
How are customer service and support?
I have been in contact with their technical support and I would rate them very highly. They're very knowledgeable and patient. That is something that I like very much. For a very new user, it's not very easy to understand and we contact the support team over email.
We do have a relationship manager as well, who acts as the central point of contact for us. They're very prompt, knowledgeable, and friendly.
How would you rate customer service and support?
Positive
Which solution did I use previously and why did I switch?
This was one of the first products we used.
What was our ROI?
Within about three months we were able to see benefits from the system. We saw a lot of time being saved, and about a 30 percent increase in our overall efficiency.
Apart from reducing our workload and improving our efficiency, we saw a 12 percent increase in our revenue last year after we implemented StreamSets. I know people will definitely see a return investment on their money from it.
What's my experience with pricing, setup cost, and licensing?
From what I hear from my team, I believe it's moderately priced because they're happy with the pricing.
What other advice do I have?
Server update maintenance is required, but that is minimal. Any product would require that type of maintenance. I don't think we are investing a lot of time and money in maintenance. The maintenance is just another cost for us. We have only two guys working on the maintenance part of the software.
It's a very intuitive product, modern, and very user-friendly in terms of the UI. Almost all our requirements have been met by StreamSets and we don't have any complaints so far.
I would recommend starting to use it as soon as possible. No tool is perfect. You have to choose the best of the lot. I certainly believe StreamSets is at the top of the ladder when it comes to similar software.
My biggest lesson from using StreamSets is that data integration can be done much more easily now. I only knew that after starting to use StreamSets. When it comes to data integration from multiple sources, and having multiple destinations, people always assume it's a time-consuming, cumbersome project. But once we started using StreamSets, all those assumptions were broken. It's very straightforward and elegant software.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
Product Manager at a hospitality company with 51-200 employees
Provides a good bifurcation rate and accuracy, and saves time and money
Pros and Cons
- "The ability to have a good bifurcation rate and fewer mistakes is valuable."
- "One thing that I would like to add is the ability to manually enter data. The way the solution currently works is we don't have the option to manually change the data at any point in time. Being able to do that will allow us to do everything that we want to do with our data. Sometimes, we need to manually manipulate the data to make it more accurate in case our prior bifurcation filters are not good. If we have the option to manually enter the data or make the exact iterations on the data set, that would be a good thing."
What is our primary use case?
We were receiving data from hospitals or any kind of healthcare service providers in the country. We were dominantly operating in the US. When we received that data, we had to classify it into different repositories or different datasets. This data was sent to different vendors, and for that, the data needed to get processed in different ways. We needed to bifurcate data at many steps with different kinds of filters. For that, we used StreamSets.
How has it helped my organization?
We could bifurcate the datasets that we received from different hospitals. We could bifurcate it on the basis of the medical requirements of the hospitals, and sometimes, on the basis of the schedule or purpose. We were obtaining data that we could then supply to some consulting firms or other sources.
StreamSets saved us time. The accuracy was pretty good, and it was definitely better than what we were using previously. Earlier, we had hired two people who were doing the job manually, and we were also using some other platform. We had to pay for them. Overall, we have saved a lot of time, and the accuracy has improved as well. We didn't calculate the time savings, but I believe we saved about three days in a week, so there were about 30% to 40% time savings.
StreamSets reduced the workload. There was a 10% to 15% reduction in the workload.
StreamSets helped us to scale our data operations. The limit at which we purchased this solution was incredible. We were never able to reach the limit that we purchased, but it helped us to increase or scale our operation. Especially in months when we received a higher number of entries, we were able to perform our work on time.
What is most valuable?
The ability to have a good bifurcation rate and fewer mistakes is valuable. In the scenario we had, when we had to bifurcate the data, we did not completely cut the data. We made a different route for one set of data, which went into a different operating system. There was also a complete set of data along with the original data that got cut, which once again went through the filtration process, and in this way, it kept on happening. Different solutions that were in place were not providing this feasibility. With the other solutions that we were using earlier, we had to reuse the data again and again from the start. It was a time-taking process.
Their support system was pretty good. When we were setting up the bifurcation protocols that we wanted to set up, we had a few support calls with them, and those were really helpful.
What needs improvement?
The design or the way they have set up the protocol is pretty good. One thing that I would like to add is the ability to manually enter data. The way the solution currently works is we don't have the option to manually change the data at any point in time. Being able to do that will allow us to do everything that we want to do with our data. Sometimes, we need to manually manipulate the data to make it more accurate in case our prior bifurcation filters are not good. If we have the option to manually enter the data or make the exact iterations on the data set, that would be a good thing. It does not have that feature. None of the solutions provides this feature, but this is the feature that we are looking for. If we could bifurcate the data or do manual manipulation of data at any point in time, it would be a game changer.
Its initial setup could also be a bit easier.
For how long have I used the solution?
I used this solution for about a year.
What do I think about the stability of the solution?
It's a stable product. We used it for about a year, and we hardly had to shut it down.
What do I think about the scalability of the solution?
We are a medium enterprise. We only have three departments in our company, and only one of the departments is using it. Salespeople don't use it. The development people don't use it. We are the ones using it, and our job is to process the information, so only one department is using the solution. We have about 18 people in the department.
Up to medium enterprises, it's a good choice. You can scale between one million to ten million data files. I don't believe they offer the service for a hundred million or one billion datasets. It isn't too scalable for large enterprises, but for small and medium enterprises, it's good.
How are customer service and support?
I'd rate them an eight out of ten. The only reason for not giving them a ten out of ten is that if you're doing very important work and you need to get the solution the same day, it's a bit tough to have the team support you in a very short period of time. They usually give you appointments about a day or two days later. Other than that, everything is good.
How would you rate customer service and support?
Positive
Which solution did I use previously and why did I switch?
We were using another solution previously. The major reason for switching to StreamSets was that we needed to scale our operations. Our prior solution could have been scaled, but the cost of scaling was a bit higher. We would have had to hire one more person to be able to scale, but we did not want to hire more people, so we decided to use a completely automated solution for this part so that it could be handled by only one of our team members. That was the primary requirement. The cost-benefit analysis was done by one of our peers. His proposal was pretty good, and everyone agreed to it.
How was the initial setup?
Its initial setup is a bit tough. You need to have the technical expertise to do that. The support team is good. They help you around, but if they could make it a bit easier, it would be better.
I believe it operates only from the cloud. We also received the data from our associations on the cloud. We processed it on the cloud, and everything happened on the cloud.
The initial setup was complex because we were not able to directly link the data we were receiving with the StreamSets solution. Linking it required us to fill in or enter some information in StreamSets, but we were not able to figure out what to enter. For that part, we needed their help.
We spent about a week. For the first three days, our team members were trying their best to do it, but then we had to schedule a meeting with them. In terms of the number of people, only one person was working with our team, and there were three people working with the product. I was also involved in the product as a product manager, but I was not directly operating that system.
It didn't require any maintenance as such. Any maintenance activities were related to our side of things. There were mistakes on our end. When we were entering different data, we had to do different configurations in the system.
What was our ROI?
We did the cost-benefit analysis before buying the solution, and it performed even better than that. We were able to replace two of our staff members who were doing this work. The cost that we paid for this solution was pretty less as compared to their salaries, so on the cost-benefit side of things, it was a good deal. We saved about two persons' manual wage, which is about $6,000 a month, and we also saved 15% of a week's time. These two were the biggest returns on the investment. The accuracy was also a bit higher.
What's my experience with pricing, setup cost, and licensing?
Its pricing is pretty much up to the mark. For smaller enterprises, it could be a big price to pay at the initial stage of operations, but the moment you have the Seed B or Seed C funding and you want to scale up your operations and aren't much worried about the funds, at that point in time, you would need a solution that could be scaled. Simultaneously, you need a solution that you don't want to use on a very long-term basis. This solution could not be applied if we were operating with all the hospital chains in the US. We were operating just with one hospital. That's why it worked pretty well, so for medium enterprises, I believe it's very good.
What other advice do I have?
To those evaluating StreamSets, I'd advise doing a cost-benefit analysis because the way of using StreamSets differs from person to person. Someone else might have a very different use case, and they may not run into profit using the solution. For us, it was a good solution because we were hiring people for this work. People were doing the job manually. We saved both time and money, so doing a cost-benefit analysis would be the best thing.
If you are looking to expand your domain or range of operations, StreamSets is very helpful. If you are just looking for a better data analytics tool that can do bifurcation on data, I believe there are other tools or services available in the market that do not focus on the expansion of operations. They focus on doing better and more complex bifurcations.
StreamSets enables you to build data pipelines without knowing how to code. After generating a few responses, you have to enter some basic syntax or code, but generally, one can do a lot of no-code stuff, which was not an important aspect for us because we were operating in the IT space, and our entire team was capable of entering all the syntaxes that were required. It was not an issue for us at any point in time. In fact, in the operations that we were performing, we only used code. When we were testing out our initial datasets, we used some no-code features that were there, but at the later stage, we used only syntaxes.
We did not connect to the messaging systems, but we connected some enterprise databases. We were operating with a set of hospitals in the US, and we had to connect with them only the first time. Afterward, it was the data that was passing through the pipeline. Initially, for a completely new user, it's a bit tricky. Some technical expertise is required. It's a bit tough, but because the support team is there, one would be able to do it.
Overall, I would rate StreamSets an eight out of ten.
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
Product Marketer at a media company with 1,001-5,000 employees
We have been able to eliminate the vast majority of our break/fix costs and maintenance time
Pros and Cons
- "The entire user interface is very simple and the simplicity of creating pipelines is something that I like very much about it. The design experience is very smooth."
- "One area for improvement could be the cloud storage server speed, as we have faced some latency issues here and there."
What is our primary use case?
Our major use case with StreamSets is to build data pipelines from multiple sources to multiple destinations. We mainly use the StreamSets Data Collector Engine for seamless streaming from any source to any destination.
We also use it to deliver continuous data for database operations and modern analytics.
How has it helped my organization?
One great thing is that now, with the implementation of StreamSets, we have been able to eliminate about 80 percent of our break/fix costs and maintenance time. It is very easy to connect with streaming platforms and streaming services.
Also, we can integrate and stream databases by connecting with multiple streaming services. Before StreamSets, data transfer from source to destination took about three hours of time and it was prone to errors. Now, with the introduction of StreamSets, we primarily use the Data Collector and this has enabled us to complete the same job in less than 30 minutes. We save that much time per day or about 15 hours per week.
Another definite benefit is that it has helped us to break down data silos within our organization. We are able to work together, with the interaction of StreamSets. Previously, the data silos were extremely perilous because data would come from multiple, scattered sources. We were not able to consolidate it on time and we were not able to exactly pinpoint errors. But StreamSets has helped us streamline the use of multiple sources and destinations, completely eliminating the silos. That saves us a lot of time and we have reduced the number of errors by a lot.
What is most valuable?
The most valuable features of StreamSets, for me, are the Data Collector and the Control Hub platform. They are both very straightforward to use and user-friendly. And with the Data Collector and Control Hub, we get canvas selection for designing all our pipelines, which is very intuitive and useful for us.
In fact, the entire user interface is very simple and the simplicity of creating pipelines is something that I like very much about it. The design experience is very smooth. A great thing about StreamSets is that it is a single, centralized platform. All our design-pattern requirements are met with a single design experience through StreamSets.
We can also easily build pipelines with minimal coding and minimal technical knowledge. It is very easy to start and very easy to scale as well. That is very important to me, personally, because I'm from a non-technical background. One of the most important criteria was for me to be able to use this platform efficiently.
Also, moving data to modern analytics platforms is very straightforward. That is why StreamSets is one of the top players in the market right now.
And one of the major advantages for us is the built-in functionality. StreamSets has a plethora of features that combine well with ETL.
What needs improvement?
In terms of features, I don't have any complaints so far. But one area for improvement could be the cloud storage server speed, as we have faced some latency issues here and there.
For how long have I used the solution?
I have been using StreamSets for about eight months.
What do I think about the stability of the solution?
It is stable. It's a cloud-based solution, so there is a little bit of latency, some server speed issues, but apart from that, there is no question about the stability of the solution.
What do I think about the scalability of the solution?
The platform is definitely scalable.
Maybe in the future we will increase our usage of StreamSets, but I don't see any immediate scalability requirements for us.
How are customer service and support?
I have not contacted their customer support, but my team contacts them. From what I understand they have a pretty healthy conversation with the StreamSets customer support. All of our queries are sent via email and they get them sorted out. They also join Google Meet sessions or calls, if required, to sort out our queries. It has been a very smooth journey so far. I don't have any complaints with regard to their customer service.
How would you rate customer service and support?
Positive
Which solution did I use previously and why did I switch?
StreamSets is the first solution that we are using in this space.
How was the initial setup?
I was not fully involved in the initial implementation, but we did the implementation in phases. We wanted to get it on board as soon as possible, so instead of doing a complete implementation, we did it in phases and it didn't take a lot of time. We were able to get on with the work as soon as possible with this model.
The initial setup was simple. We didn't require any additional training or third-party vendors. We were able to do it along with the StreamSets team, so it was smooth for us.
We have 15 people using StreamSets, all at one location. They are developers and users.
Because it is a cloud platform there isn't much maintenance required other than server updates, but that is expected with any cloud platform. No extensive maintenance is required. We have a team of two people who maintain it and handle updates and all the latest releases.
What was our ROI?
Tasks that took three hours can now be done in less than 30 minutes. This is one of the prime data points in terms of ROI for this product.
In terms of money saved, we still haven't seen any direct results from StreamSets. With its automation, we are able to focus on other tasks because StreamSets is taking care of the operations side. Theoretically, it should save us some money but it hasn't until now. We still have the same number of employees.
We are moving in a positive direction. Hopefully, this trend continues. We were able to see the time savings and reduced errors within three months of deployment.
What's my experience with pricing, setup cost, and licensing?
There are two editions, Professional and Enterprise, and there is a free trial. We're using the Professional edition and it is competitively priced. I wouldn't say it's cheap or moderate, but it's also not a high price.
What other advice do I have?
We have been experimenting with Hadoop, but apart from that, we do not use it to establish a connection with other services. As an organization, we have not faced any issues with connectivity using StreamSets. The platform is very stable.
Overall, StreamSets is very efficient and effective. It has helped us save a lot of time and also reduced errors a lot. I would definitely rate it very highly. The major reason is that it gives us a single, centralized platform for all our design-pattern requirements and we are able to produce results efficiently. With StreamSets, we are able to transfer or stream data from any source to any destination. It has increased the overall efficiency of our organization.
Software AG is constantly improving and evolving the product, and that is something that I like: using a product that is ever-evolving and being upgraded.
After deploying StreamSets, I learned a lot about how data planning works and how easy it is to stream from multiple sources to multiple destinations. That is one of my major takeaways. I thought it would be a very complex task, but that myth was broken by StreamSets. The complexity was made very simple for me.
My advice is to try the free edition. It's a very user-friendly and intuitive product as well. Try it to get a grasp of what's happening inside the product. Once you try the free edition, you'll definitely go for the Professional edition. I don't have any doubt about that. The product itself will lure you. That is the power of the product.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
Software Developer at Appnomu Business Services
Simplifies the way we perform tasks and engineer pipelines at all stages
Pros and Cons
- "StreamSets Transformer is a good feature because it helps you when you are developing applications and when you don't want to write a lot of code. That is the best feature overall."
- "The monitoring visualization is not that user-friendly. It should include other features to visualize things, like how many records were streamed from a source to a destination on a particular date."
What is our primary use case?
It is primarily being used by our IT department to configure things and see what is missing and what the issues are.
How has it helped my organization?
I'm using StreamSets to find issues with our software and it is helping us to do so, and to make sure that we are able to debug on time. It makes things much simpler. We can use the solution to know what issue is happening at the moment. We are able to easily identify a leak and resolve it on time.
It reduces our workload by about 30 percent. And it saves us a lot on having to hire expensive technical experts or software engineers. You purchase a package with a reasonable pricing model, and then you can use it with your team. It saves us from hiring a technical person to carry out the tasks. With StreamSets, you can do a task easily.
It also makes it easy to send data from one place to another.
StreamSets is doing a lot in our IT operations because it is simplifying the way we perform tasks and the way we engineer pipelines at all stages, including the sources, processes, and destination use. We can schedule data pipelines and that's easy.
And because it is low-code software, you don't need to develop the code and that really saves a lot of time. Using the canvas to create and engineer data pipelines is very easy. StreamSets saves me three hours that it would take me to manually do a task.
What is most valuable?
StreamSets Transformer is a good feature because it helps you when you are developing applications and when you don't want to write a lot of code. That is the best feature overall. They really help you to come up with a solution more quickly. The Transformer logic is very easy, as long as you understand the concept of what you intend to develop. It doesn't require any technical skills.
The overall GUI and user interface are also good because you don't need to write complex programming for any implementation. You just drag and configure what you want to implement. It's very easy and you can use it without knowing any programming language.
The design experience is much easier when you want to integrate other systems and tools and make them work in a particular format. It helps you improve the topologies. You can view the status of all the pipelines you have developed and monitor them.
Connecting to enterprise data stores is also very easy, as is monitoring and managing things in one place.
What needs improvement?
The monitoring visualization is not that user-friendly. It should include other features to visualize things, like how many records were streamed from a source to a destination on a particular date.
I would also like better, detailed logging of error information.
It also needs a fragment drill-down feature when monitoring a data flow. That needs a lot of improvement, especially when you are running a job.
For how long have I used the solution?
This is my second year using StreamSets.
What do I think about the stability of the solution?
It's stable.
What do I think about the scalability of the solution?
It is a scalable solution for any company that needs to know about its data processing.
How are customer service and support?
It is hard to get technical support from the company. To receive one-on-one communication requires a budget, which we don't really have. The way we get technical support is through the documentation and knowledge base.
It is missing a live instant chat on the dashboard.
How would you rate customer service and support?
Neutral
Which solution did I use previously and why did I switch?
We did not have a previous solution.
How was the initial setup?
Initially, the deployment could be very hard if you do not have a lot of technical skills, but as you get used to the software, within a day, the deployment becomes straightforward and becomes easy. It took two weeks to have everything configured in the right manner. I worked with one other colleague to set everything up.
It is hard, especially when you are a beginner, but when you read the documentation you can set things up quickly. The documentation helps out if you don't have good knowledge of the solution.
It doesn't require maintenance.
What was our ROI?
The solution is helping a lot because we are not spending a lot of money on a technical team. We just subscribe to the software and we're able to configure things. It has helped us save on resources by 30 percent.
What's my experience with pricing, setup cost, and licensing?
The pricing is too fixed. It should be based on how much data you need to process. Some businesses are not so big that they process a lot of data. They process a lot of debugging. The pricing is not so favorable for a small enterprise because it is too limited.
What other advice do I have?
I would recommend the software to any business that needs to do data engineering. If they design data pipelines, it is really a great idea to test out StreamSets. Unfortunately, you need a good budget for it. If a small business doesn't have the budget, I cannot recommend it. But if they have a good budget, I really recommend it because it has so many features that can really help data scientists and analysts generate patterns or insights for their businesses. And it will benefit their customers as well.
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
AI Engineer at Techvanguard
A no-code solution with a drag-and-drop UI, but the execution engine should be better
Pros and Cons
- "The most valuable would be the GUI platform that I saw. I first saw it at a special session that StreamSets provided towards the end of the summer. I saw the way you set it up and how you have different processes going on with your data. The design experience seemed to be pretty straightforward to me in terms of how you drag and drop these nodes and connect them with arrows."
- "The execution engine could be improved. When I was at their session, they were using some obscure platform to run. There is a controller, which controls what happens on that, but you should be able to easily do this at any of the cloud services, such as Google Cloud. You shouldn't have any issues in terms of how to run it with their online development platform or design platform, basically their execution engine. There are issues with that."
What is our primary use case?
I was working on an integration project where I was using the StreamSets platform. I was looking at both their data collector and their transformer. The idea was to integrate it with AWS SageMaker Canvas. Both of them are what they call no-code options. StreamSets is for data pipelining, managing your data flow, and transforming your data. SageMaker is AWS, and Canvas is basically their no-code option for machine learning.
I was trying to connect it to a data object repository. For AWS, that's a specific managed service called S3. I wasn't trying to run it with a data warehouse.
How has it helped my organization?
It's still in the trial stage. I don't get a 30-day trial period or anything like that. I just got to write about what's involved and then see if that's something that justifies the use case for going ahead and purchasing the license for it.
It enables you to build data pipelines without knowing how to code. It abstracts away the need for Spark or anything like that. This ability is highly important because it reduces development time.
It saves time because you don't have to write code.
It saves money by not having to hire people with specialized skills. You don't need Spark or anything like that for doing the same thing.
It helps to scale your data operations. You can get to the execution engine and provision bigger machines or bigger clusters. You can scale out to however much data you need to scale out to.
What is most valuable?
The most valuable would be the GUI platform that I saw. I first saw it at a special session that StreamSets provided towards the end of the summer. I saw the way you set it up and how you have different processes going on with your data. The design experience seemed to be pretty straightforward to me in terms of how you drag and drop these nodes and connect them with arrows.
What needs improvement?
The execution engine could be improved. When I was at their session, they were using some obscure platform to run. There is a controller, which controls what happens on that, but you should be able to easily do this at any of the cloud services, such as Google Cloud. You shouldn't have any issues in terms of how to run it with their online development platform or design platform, basically their execution engine. There are issues with that.
It can break down data silos within the organization. One person can do the whole thing with StreamSets and SageMaker Canvas, but it hasn't yet had any effect on our operations or business because it's one of those situations where you can either get a demo from them or you basically have to go to one of these sessions and they give you temporary credentials and try to work with your use case. Personally, I would change their model a bit and give a two-week trial license for a cloud platform at the very least. You can then try to get something to work or call up their technical department and say, "Look, I've been evaluating this thing for the last few days. I don't know exactly how to resolve this issue."
For how long have I used the solution?
I started using it in June of this year.
What do I think about the stability of the solution?
The whole issue of the execution engine needs to be better resolved. If you pick a cloud, why isn't it working with this cloud? Or what do I need to do to get it to work with one specific cloud service if it can be deployed across multiple clouds?
What do I think about the scalability of the solution?
It seems pretty highly scalable to me. That's not going to be an issue. Just the administration of it could be an issue.
It's currently being used in a dev department for machine learning. It's being used by the business analyst team.
How are customer service and support?
I haven't contacted their support.
Which solution did I use previously and why did I switch?
AWS has native solutions. There are AWS Data Wrangler and others that come bundled with their services, like AWS Glue. We haven't yet switched to StreamSets. It's still in the evaluation stage, but the no-code and the drag-and-drop option with a GUI are some of the things that seem to resonate with people.
How was the initial setup?
I was involved in its setup. I was the one who basically had to try to get it to run with whatever process or custom processor I developed.
It was complex to set up. I had to go to the sessions. On a couple of occasions, I was doing it directly from the cloud platform, and apparently, that wasn't the way to do it. You have to go through their universal designer platform first.
In terms of maintenance, once you're deployed from the cloud, that's all handled for you. It's managed for you directly from the cloud service. So, you don't have to worry about that. They maintain their design platform.
What about the implementation team?
I didn't use any consultant.
What's my experience with pricing, setup cost, and licensing?
I didn't get into that with the StreamSets representative. It seems to be pay-as-you-go, but I don't know exactly how they do it.
Which other solutions did I evaluate?
Alteryx is another option. It's a similar tool, and it looks almost the same as StreamSets. Alteryx is something that's available for any cloud. It doesn't matter which cloud. You go on the various clouds, and you look and see what they have.
What other advice do I have?
To those evaluating this solution, I would advise looking into how it integrates with the cloud service that they're going to try it with. Does it naturally integrate better with AWS or Azure? It's one of those situations.
I used StreamSets' ability to move data into a modern analytics platform. That's what the AWS SageMaker Canvas is. It's like predictive analytics. In terms of ease of moving data into this analytics platform, doing the design on the StreamSets platform is one thing, but having the execution engine and getting that provision is a totally different ball game. Basically, that's where its limitation comes in.
Overall, I would rate it a seven out of ten. The issue that was never resolved for me was if you're running a compute or execution engine on AWS versus Azure versus GCP, how does that integration work because that has got nothing to do with StreamSets? That is outside of StreamSets. You're now dealing with the cloud service, and there's a good reason for that.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
Director Data Engineering, Governance, Operation and Analytics Platform at a financial services firm with 10,001+ employees
Ease of configuring and managing pipelines centrally
Pros and Cons
- "I really appreciate the numerous ready connectors available on both the source and target sides, the support for various media file formats, and the ease of configuring and managing pipelines centrally."
- "StreamSets should provide a mechanism to be able to perform data quality assessment when the data is being moved from one source to the target."
What is our primary use case?
We are using StreamSets to migrate our on-premise data to the cloud.
What is most valuable?
I really appreciate the numerous ready connectors available on both the source and target sides, the support for various media file formats, and the ease of configuring and managing pipelines centrally. It's like a plug-and-play setup.
What needs improvement?
StreamSets should provide a mechanism to be able to perform data quality assessment when the data is being moved from one source to the target. So the ability to validate the data against various data rules. Then, based on the failure of data quality assessment, be able to send alerts or information to help people understand the data validation issues.
For how long have I used the solution?
I have been using StreamSets for a year and a half.
What do I think about the stability of the solution?
It's reasonably stable.
What do I think about the scalability of the solution?
It's reasonably easy to scale. Around 25 to 30 end users are using this solution in our organization.
How are customer service and support?
Customer service and support are good.
How would you rate customer service and support?
Positive
How was the initial setup?
It's reasonably easy to deploy. However, since it is used at an enterprise level, it requires maintenance. So we had a maintenance contract.
In the financial industry, we have very strict regulations around deploying something in the cloud. So, it requires a lot of permission and other processes.
Just one person is enough for the maintenance.
What's my experience with pricing, setup cost, and licensing?
The pricing was reasonably economical and easy for us to afford when we engaged with StreamSets. It was not part of Software AG at that time.
What other advice do I have?
It's a very good tool. Overall, I would rate the solution an eight out of ten.
Which deployment model are you using for this solution?
Hybrid Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.

Buyer's Guide
Download our free StreamSets Report and get advice and tips from experienced pros
sharing their opinions.
Updated: February 2025
Product Categories
Data IntegrationPopular Comparisons
Informatica Intelligent Data Management Cloud (IDMC)
Azure Data Factory
Informatica PowerCenter
Oracle Data Integrator (ODI)
Talend Open Studio
IBM InfoSphere DataStage
Oracle GoldenGate
SAP Data Services
Qlik Replicate
Denodo
Alteryx Designer
Fivetran
SnapLogic
Spring Cloud Data Flow
Buyer's Guide
Download our free StreamSets Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:
- How does Matillion ETL compare to StreamSets?
- When evaluating Data Integration, what aspect do you think is the most important to look for?
- Microsoft SSIS vs. Informatica PowerCenter - which solution has better features?
- What are the best on-prem ETL tools?
- Which integration solution is best for a company that wants to integrate systems between sales, marketing, and project development operations systems?
- Experiences with Oracle GoldenGate vs. Oracle Data Integrator?
- Should we choose Data Hub or GoldenGate?
- What are the must-have features for a Data integration system?
- Is there a bulletproof KPI Data Manager for SME?
- A recent review wrote that PowerCenter has room for improvement. Agree or Disagree?