We are using StreamSets to migrate our on-premise data to the cloud.
Director Data Engineering, Governance, Operation and Analytics Platform at a financial services firm with 10,001+ employees
Ease of configuring and managing pipelines centrally
Pros and Cons
- "I really appreciate the numerous ready connectors available on both the source and target sides, the support for various media file formats, and the ease of configuring and managing pipelines centrally."
- "StreamSets should provide a mechanism to be able to perform data quality assessment when the data is being moved from one source to the target."
What is our primary use case?
What is most valuable?
I really appreciate the numerous ready connectors available on both the source and target sides, the support for various media file formats, and the ease of configuring and managing pipelines centrally. It's like a plug-and-play setup.
What needs improvement?
StreamSets should provide a mechanism to be able to perform data quality assessment when the data is being moved from one source to the target. So the ability to validate the data against various data rules. Then, based on the failure of data quality assessment, be able to send alerts or information to help people understand the data validation issues.
For how long have I used the solution?
I have been using StreamSets for a year and a half.
Buyer's Guide
StreamSets
December 2024
Learn what your peers think about StreamSets. Get advice and tips from experienced pros sharing their opinions. Updated: December 2024.
831,265 professionals have used our research since 2012.
What do I think about the stability of the solution?
It's reasonably stable.
What do I think about the scalability of the solution?
It's reasonably easy to scale. Around 25 to 30 end users are using this solution in our organization.
How are customer service and support?
Customer service and support are good.
How would you rate customer service and support?
Positive
How was the initial setup?
It's reasonably easy to deploy. However, since it is used at an enterprise level, it requires maintenance. So we had a maintenance contract.
In the financial industry, we have very strict regulations around deploying something in the cloud. So, it requires a lot of permission and other processes.
Just one person is enough for the maintenance.
What's my experience with pricing, setup cost, and licensing?
The pricing was reasonably economical and easy for us to afford when we engaged with StreamSets. It was not part of Software AG at that time.
What other advice do I have?
It's a very good tool. Overall, I would rate the solution an eight out of ten.
Which deployment model are you using for this solution?
Hybrid Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Data Engineer at a consultancy with 11-50 employees
Effective, and helps scale data operations, but sometimes the support's response is slow
Pros and Cons
- "In StreamSets, everything is in one place."
- "If you use JDBC Lookup, for example, it generally takes a long time to process data."
What is our primary use case?
The project which I work on is developed in StreamSets and I lead the team. I'm the team leader and the Solution Architect. I also train my juniors and my team.
For the last year and a half, I’ve been using this tool and this tool is very effective for data processing from source to destination. This tool is very effective and I developed many integrations in this tool.
How has it helped my organization?
The solution is really effective.
What is most valuable?
It's very effective in project delivery. This month, at the end of June, I will deploy all the integrations which I developed in StreamSets to production remit. The business users and customers are happy with the data flow optimizer from the SoPlex cloud. It all looks good.
Not many challenges are there in terms of learning new technologies and using new tools. We will try and do an R&D analysis more often.
Everything is in place and it comes as a package. They install everything. The package includes Python, Ruby, and others. I just need to configure the correct details in the pipeline and go ahead with my work.
The ease of the design experience when implementing batch streaming and ETL pipelines is very good. The streaming is very good. Currently, I'm using Data Collector and it’s effective. If I'm going to use less streaming, like in Java core, I need to follow up on different processes to deploy the core and connect the database. There are not so many cores that I need to write.
In StreamSets, everything is in one place. If you want to connect a database and configure it, it is easy. If you want to connect to HTTP, it’s simple. If I'm going to do the same with my other tools, I don’t need many configurations or installations. StreamSets' ability to connect enterprise data stores such as OLTP databases and Hadoop, or messaging systems such as Kafka is good. I also send data to both the database and Kafka as well.
You will get all the drives that you will need to install with the database. If you use other databases, you're going to need a JDBC, which is not difficult to use.
I'm sending data to different CDP URL databases, cloud areas, and Azure areas.
StreamSets' built-in data drift resilience plays a part in our ETL operations. We have some processors in StreamSets, and it will tell us what data has been changed and how data needs to be sent.
It's an easy tool. If you're going to use it as a customer, then it should take a long time to process data. I'm not sure if in the future, it will take some time to process the billions of records that I'm working on. We generally process billions of records on a daily basis. I will need to see when I work on this new project with Snowflake. We might need to process billions of records, and that will happen from the source. We’ll see how long it needs to take and how this system is handling it. At that point, I’ll be able to say how effectivly StreamSets processes it.
The Data Collector saves time. However, there are some issues with the DPL.
StreamSets helped us break down data silos within our organizations.
One advantage is that everything happens in one place, if you want to develop or create something, you can get those details from StreamSets. The portal, however, takes time. However, they are focusing on this.
StreamSets' reusable assets have helped to reduce workload by 32% to 40%.
StreamSets helped us to scale our data operations.
If you get a request to process data for other processing tools, it might take a long time, like two to three hours. With this, I can do it within half an hour, 20 or 30 minutes. It’s more effective. I have everything in one place and I can configure everything. It saves me time as it is so central.
What needs improvement?
If you use JDBC Lookup, for example, it generally takes a long time to process data.
StreamSets enables us to build data pipelines without knowing how to code. You can do it, however, you need to know data flow. Without knowing anything, it's a bit difficult for new people. You need some technical skills if you are to create a data pipeline. When procuring the data pipeline, for example, you need the original processor and destination. If you don't know where you're going to read the data, where to send the data, and if you have to send the data, you have to configure it. If the destination you're looking for is some particular message permit or data permit, then you should write your own code there. You need some knowledge of coding as StreamSets does not provide any coding.
StreamSets data drift resilience has not exactly reduced the time it takes for us to fix data drift breakages. A lot of improvements are required from StreamSets. I'm not sure how they're planning to make it happen. There are some issues in the case of data processing, and other scenarios.
If the data processing in StreamSets takes a long time as compared to the previous solution, then we will reconsider why we use StreamSets.
For how long have I used the solution?
I've been using this StreamSets for the last two years.
What do I think about the stability of the solution?
In terms of stability, there have been one or two issues. Good people work on the solutions when we have issues. However, sometimes we don't get a good solution.
As a user, I expect a lot more and that the solution will come quicker as compared to keeping projects on hold or keeping them for a long time. If they do not have any solution, then we can plan accordingly how to use the other processors. They just need to let us know quickly.
What do I think about the scalability of the solution?
The scalability is good.
We do plan to increase usage.
How are customer service and support?
In terms of technical support, they generally do a detailed analysis from their end. They always try to give a proper solution. However, sometimes, they won't get to any proper solution. They'll come back and look into it and sometimes it takes time. If they can speed up the process a little bit that would be ideal. We are always sitting on the edge. If we don't get a proper response from them, then it will be very difficult for us to answer to higher management.
How would you rate customer service and support?
Neutral
Which solution did I use previously and why did I switch?
This is my first solution of this kind. Previously, I was working in open source systems, with scripting, et cetera. This is the first time I've worked in the data area. I've got full support. As a new data user, I'm still getting used to it.
How was the initial setup?
The setup is straightforward, it's not complex and it is simple.
We treat it like a pipeline. We are not writing code and putting things in. In the case of a pipeline, you can export it and input it, or you can make it a pipeline. It can be auto-deployed into a respective environment. That's what we did.
We have different destinations we need to send to. We aren't using a single destination. In that sense, we do have multiple computations. We set up, send the data and do the deployments.
There is occasional maintenance needed. Sometimes, if something goes wrong, we'll have to correct the data. We just check here and there for the most part.
What about the implementation team?
We did not need an integrator or consultant to assist with the setup.
As a team, we do the deployment. We won't send it to others, whatever we develop, we will test and deploy. We already have the system in place and it is really helpful for the deployment of the solution.
What was our ROI?
I haven't seen an ROI.
It's not exactly saving us money as it's a new tool. If I'm going to hire someone new, I will not hire based on the StreamSets tool or some specific tools, and I might save money right away. However, I'm spending time on my side. StreamSets is not being used by many horizons. In some places in Europe, fewer companies are using StreamSets. People should get to know StreamSets and they should get some expertise in the area, the way AWS and Azure do. I’m spending a lot more time and therefore I’m not saving money. That said, I’m also not losing money.
What's my experience with pricing, setup cost, and licensing?
Higher management handled the licensing. However, I can't say how much it costs. I'm more on the user side.
Which other solutions did I evaluate?
I did not evaluate other options.
What other advice do I have?
I have not yet used StreamSets' Transformer for Snowflake functionality. I created one POC, not with Snowflake, however, I'm going to use Snowflake in my next project.
I'd rate the solution seven out of ten. They are doing a good job. Using this solution I can feel the data and see the user flows.
If you are going to withdraw on-premise, and you're just copying the data to a table, you're not going to see how much data has been copied. With this, I'm seeing how much data has been transferred, and where the processor is. It gives a clear picture with metric details and notifications. That's the reason I used this tool for the last two years.
Which deployment model are you using for this solution?
Hybrid Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
Buyer's Guide
StreamSets
December 2024
Learn what your peers think about StreamSets. Get advice and tips from experienced pros sharing their opinions. Updated: December 2024.
831,265 professionals have used our research since 2012.
Senior Network Administrator at a energy/utilities company with 201-500 employees
Helped us break down data silos and produce better, up-to-date reports, as well as save money
Pros and Cons
- "The most valuable feature is the pipelines because they enable us to pull in and push out data from different sources and to manipulate and clean things up within them."
- "The design experience is the bane of our existence because their documentation is not the best. Even when they update their software, they don't publish the best information on how to update and change your pipeline configuration to make it conform to current best practices. We don't pay for the added support. We use the "freeware version." The user community, as well as the documentation they provide for the standard user, are difficult, at best."
What is our primary use case?
We use the whole Data Collector application.
How has it helped my organization?
We now consume many more hundreds of terabytes of data than we used to before we had StreamSets. It has definitely enabled us to do things a lot faster, and be a lot more agile, with a lot more data consumption and a lot more reporting.
Another benefit is that it has helped us to break down data silos. We now consume data across different silos and then we aggregate it together so that we can do reporting that is not just for that one silo of people but for a number of different people across the entire organization. That has had a positive effect, enabling us to save money, spend money more effectively, and have more up-to-date data in reports, as well as in auditing. Our safety processes are better too.
One way we have saved money is thanks to how the solution streamlines the data that we pull in, data that we weren't pulling in before.
StreamSets allows more people to know what's going on. It helps us with better allocation of resources, better allocation of staff, and right-sizing. We're in oil and gas and, in our case, it allows us to optimize what we're pulling out of the ground and then what we're selling.
It has helped to scale our data operations and as a result, in addition to saving money and right-sizing, it's helped our field operations and provided us with more management reporting.
Also, the data drift resilience reduces the time it takes to fix data drift breakages.
What is most valuable?
The most valuable feature is the pipelines because they enable us to pull in and push out data from different sources and to manipulate and clean things up within them.
We use StreamSets to connect to enterprise data stores, including OLTP databases and Hadoop. Connecting to them is pretty easy. It's the data manipulation and the data streaming that are the harder parts behind that, just because of the way the tool is written.
What needs improvement?
The design experience is the bane of our existence because their documentation is not the best. Even when they update their software, they don't publish the best information on how to update and change your pipeline configuration to make it conform to current best practices.
We don't pay for the added support. We use the "freeware version." The user community, as well as the documentation they provide for the standard user, are difficult, at best.
However, we have a couple of people in-house here who are experts in data analysis and they have figured out how to use this tool. We have to have people who are extremely skilled to go in and write the pipelines for this software because it's so complicated. The software works great for us, but there is an extremely steep learning curve because they don't provide a lot of information outside of paying their ridiculous support costs. Their support starts at $50,000 a year and up.
Also, the built-in data drift resilience for ETL operations requires a bunch of custom code development to be able to handle that. It's somewhat difficult because you have to customize it a fair amount.
I also would like a more user-friendly interface and better error-trap handling.
For how long have I used the solution?
We have been using StreamSets for about four years.
What do I think about the stability of the solution?
We just patched ourselves up to the latest release about a month ago, so it's actually pretty stable at this point. It used to be quite buggy, going back over the last little while, but it's pretty stable now.
What do I think about the scalability of the solution?
This software is very scalable.
Which solution did I use previously and why did I switch?
We did not have a previous solution.
How was the initial setup?
The initial setup was somewhere between straightforward and complex. It was pretty straightforward to start with, but then it started ramping up to be more difficult as we wanted to add more stuff in.
The difficulty depends upon your data sources. If you have just one data source and you want to consume a lot of different types of data from that one source, it's pretty straightforward. But when you have 20 or 25 different data sources, and you need to pipeline all that data into a couple of data warehouses so that you can use advanced data analytics software to do reporting, analysis, and notifications, it's a lot more complicated. With every data source, it becomes exponentially more complicated to manage.
We spent a significant amount of time doing it, but otherwise, it was seamless because it was our own staff. We didn't have to worry about trying to find money or resource time or do any of the prep work needed to get external resources.
Ours is a single deployment, but it is used across our entire staff base of 200-plus people. We need three people for deployment and maintenance, whose responsibilities include software management, application management, and data analysis and management.
What was our ROI?
The ROI we have seen is in savings of time and money.
What's my experience with pricing, setup cost, and licensing?
We use the free version. It's great for a public, free release. Our stance is that the paid support model is too expensive to get into. They should honestly reevaluate that.
We tried to go and get them to look at their licensing and support model and they said they were not interested in reevaluating that in any way.
Which other solutions did I evaluate?
We tried to use another freeware ETL tool. It's fairly well-known. We ran it for a couple of months but it was going to be even more difficult than StreamSets, so we chose that in the end.
What other advice do I have?
The ease of using StreamSet to move data into modern analytics platforms, on a scale of one to 10, is about a five.
The solution enables you to build data pipelines without knowing how to code if it's the latest, state-of-the-art cloud connecting stuff. If it's for anything structured for Oracle and SQL Server and other data sources, it's difficult. Without knowing how to write code, some of it's easy and some of it is not.
My advice to someone who is considering this software is to be very aware that their integrator and data analysis people will need a very specific skill set.
Which deployment model are you using for this solution?
On-premises
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
Senior Technical Manager at a financial services firm with 501-1,000 employees
The ease of configuration for pipes is amazing, and the GUI is very nice
Pros and Cons
- "The Ease of configuration for pipes is amazing. It has a lot of connectors. Mainly, we can do everything with the data in the pipe. I really like the graphical interface too"
- "I would like to see it integrate with other kinds of platforms, other than Java. We're going to have a lot of applications using .NET and other languages or frameworks. StreamSets is very helpful for the old Java platform but it's hard to integrate with the other platforms and frameworks."
- "StreamSet works great for batch processing but we are looking for something that is more real-time. We need latency in numbers below milliseconds."
What is our primary use case?
It performs very well. The main use is to extract information from some of our Kafka topics and put it in our internal systems, flat files, and integration with Java.
How has it helped my organization?
It facilitates the consumption of the data in batch mode to the system where it is required. We don't do a lot of transformations or joining or forking of the information. It's more point-to-point connectivity that we implement over StreamSets.
What is most valuable?
The ease of configuration for pipes is amazing. It has a lot of connectors. Mainly, we can do everything with the data in the pipe. I really like the graphical interface too. It's pretty nice.
What needs improvement?
I would like to see it integrate with other kinds of platforms, other than Java. We're going to have a lot of applications using .NET and other languages or frameworks. StreamSets is very helpful for the old Java platform but it's hard to integrate with the other platforms and frameworks.
StreamSets works great for batch processing but we are looking for something that is more real-time. We need latency in numbers below milliseconds.
For how long have I used the solution?
One to three years.
What do I think about the stability of the solution?
It's pretty stable. StreamSets has been up and running up for months without any intervention in terms of the operations team. It's great.
I don't know if they can implement some kind of high-availability. I really don't go deep into that kind of configuration because, with only one node and running as stably as it is, we have no problem with that. But for critical operations, I'd like to know if I can facilitate some kind of high-availability, in case one of the nodes go down.
What do I think about the scalability of the solution?
It's pretty scalable.
How is customer service and technical support?
I don't use support. I mainly use the community or web searches; self-learning.
How was the initial setup?
The initial setup is pretty straightforward.
What other advice do I have?
If you are looking for something to do batch processing in Java, this is the right solution. We did the exploration when we were trying to implement a batch processing system and decided that StreamSets is the best for that. If you're looking for real-time, you may want to look at another system or the next version of this one.
Because of the kind of system that we need to implement with this kind of solution, the most important factors I look at when selecting a vendor are things like latency and real-time processing.
I would rate it at nine out of 10. What would make it a 10 would be, as I said, I'd like to have more integration with other kinds of languages or frameworks and also more real-time processing, not batch.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Data Engineer at a energy/utilities company with 10,001+ employees
Easy to set up and use, and the functionality for transforming data is good
Pros and Cons
- "It is really easy to set up and the interface is easy to use."
- "We've seen a couple of cases where it appears to have a memory leak or a similar problem."
What is our primary use case?
We typically use it to transport our Oracle raw datasets up to Microsoft Azure, and then into SQL databases there.
What is most valuable?
It is really easy to set up and the interface is easy to use.
We found it pretty easy to transform data.
The online documentation is pretty good.
What needs improvement?
We've seen a couple of cases where it appears to have a memory leak or a similar problem. It grows for a bit and then we'd have to restart the container, maybe once a month when it gets high.
For how long have I used the solution?
We have been using StreamSets for about one year. We may have been experimenting with it slightly before that time.
What do I think about the stability of the solution?
Other than the memory issue that we occasionally see, the stability has been really good.
What do I think about the scalability of the solution?
We haven't seen a problem with scaling it.
How are customer service and technical support?
I haven't had to deal with technical support. We would first check the online documentation or web documentation, and usually found what we needed. We haven't had to call them.
Which solution did I use previously and why did I switch?
Prior to using StreamSets, we were using Microsoft CDC (Change Data Capture). It was a fairly old product and there were lots of workaround and lots of issues that we had with it. We were looking for something more user-friendly. It was pretty stable, so that was not an issue.
How was the initial setup?
This product was a lot easier to use than the one we had before it. It took us half an hour and we were set up and running it, the first time.
What's my experience with pricing, setup cost, and licensing?
We are running the community version right now, which can be used free of charge. We were debating whether to move it to the commercial version, but we haven't had the need to, just yet.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Download our free StreamSets Report and get advice and tips from experienced pros
sharing their opinions.
Updated: December 2024
Product Categories
Data IntegrationPopular Comparisons
Informatica Intelligent Data Management Cloud (IDMC)
Azure Data Factory
Informatica PowerCenter
Oracle Data Integrator (ODI)
Talend Open Studio
IBM InfoSphere DataStage
Oracle GoldenGate
SAP Data Services
Qlik Replicate
Alteryx Designer
Denodo
Fivetran
SnapLogic
Spring Cloud Data Flow
Buyer's Guide
Download our free StreamSets Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:
- How does Matillion ETL compare to StreamSets?
- When evaluating Data Integration, what aspect do you think is the most important to look for?
- Microsoft SSIS vs. Informatica PowerCenter - which solution has better features?
- What are the best on-prem ETL tools?
- Which integration solution is best for a company that wants to integrate systems between sales, marketing, and project development operations systems?
- Experiences with Oracle GoldenGate vs. Oracle Data Integrator?
- Should we choose Data Hub or GoldenGate?
- What are the must-have features for a Data integration system?
- Is there a bulletproof KPI Data Manager for SME?
- A recent review wrote that PowerCenter has room for improvement. Agree or Disagree?