We are using the StreamSets DataOps platform to ingest data to a data lake.
Senior Data Engineer at a energy/utilities company with 1,001-5,000 employees
Quite simple to use for anybody who has an ETL or BI background
Pros and Cons
- "StreamSets data drift feature gives us an alert upfront so we know that the data can be ingested. Whatever the schema or data type changes, it lands automatically into the data lake without any intervention from us, but then that information is crucial to fix for downstream pipelines, which process the data into models, like Tableau and Power BI models. This is actually very useful for us. We are already seeing benefits. Our pipelines used to break when there were data drift changes, then we needed to spend about a week fixing it. Right now, we are saving one to two weeks. Though, it depends on the complexity of the pipeline, we are definitely seeing a lot of time being saved."
- "Currently, we can only use the query to read data from SAP HANA. What we would like to see, as soon as possible, is the ability to read from multiple tables from SAP HANA. That would be a really good thing that we could use immediately. For example, if you have 100 tables in SQL Server or Oracle, then you could just point it to the schema or the 100 tables and ingestion information. However, you can't do that in SAP HANA since StreamSets currently is lacking in this. They do not have a multi-table feature for SAP HANA. Therefore, a multi-table origin for SAP HANA would be helpful."
What is our primary use case?
How has it helped my organization?
Our time to value has increased because our development time has been considerably reduced. The major benefit that we are getting out of the solution is the ability to easily transform and upskill a person who has already worked on an ETL or BI background. We don't need to specifically look for people who know programming or worked on Python, DataOps, or a DevOps sort of functionality. In the market, it is easier to find people with ETL or BI skills than people with hardcore DevOps or programming skills. That is the major benefit that we are getting out of moving to a GUI-based tool like StreamSets. How quickly we are delivering to our customers, as well as our ability to ingest to a data lake, have actually improved a lot by using this tool.
What is most valuable?
The types of the source systems that it can work with are quite varied. There are numerous source systems that it can work with, e.g., a SQL Server database, an Oracle Database, or REST API. That is an advantage we are getting.
The most important feature is the Control Hub that comes with the DataOps Platform and does load balancing. So, we do not worry about the infrastructure. That is a highlight of the DataOps platform: Control Hub manages the data load to various engines.
It is quite simple for anybody who has an ETL or BI background and worked on any ETL technologies, e.g., IBM DataStage, SAP BODS, Talend, or CloverETL. In terms of experience, the UI and concepts are very similar to how you develop your extraction pipeline. Therefore, it is very simple for anybody who has already worked on an ETL tool set, either for your data ingestion, ETL pipeline, or data lake requirements.
We use StreamSets to load into AWS S3 and Snowflake databases, which are then moved forward by Power BI or Tableau. It is quite simple to move data into these platforms using StreamSets. There are a lot of tools and destination stages within StreamSets and Snowflake, Amazon S3, any database, or an HTTP endpoint. It is just a drag-and-drop feature that is saving a lot of time when rewriting any custom code in Python. StreamSets enables us to build data pipelines without knowing how to code, which is a big advantage.
The data resilience feature is good enough for our ETL operations, even for our production pipelines at this stage. Therefore, we do not need to build our own custom framework for it since what is available out-of-the-box is good enough for a production pipeline.
StreamSets data drift feature gives us an alert upfront so we know that the data can be ingested. Whatever the schema or data type changes, it lands automatically into the data lake without any intervention from us, but then that information is crucial to fix for downstream pipelines, which process the data into models, like Tableau and Power BI models. This is actually very useful for us. We are already seeing benefits. Our pipelines used to break when there were data drift changes, then we needed to spend about a week fixing it. Right now, we are saving one to two weeks. Though, it depends on the complexity of the pipeline, we are definitely seeing a lot of time being saved.
What needs improvement?
One room for improvement is probably the GUI. It is pretty basic and a lot of improvement is required there.
In terms of security, from an architecture perspective, when we want to implement something, and because our organization is very strict when it comes to cybersecurity, we have been struggling a bit because the platform has a few gaps. Those gaps are really gaps based on our organization's requirements. These are not gaps on StreamSets' side. The solution could improve a lot in terms of having more features added to the security model, which would help us.
There are quite a few features that we wanted. One is SAP HANA. Currently, we can only use the query to read data from SAP HANA. What we would like to see, as soon as possible, is the ability to read from multiple tables from SAP HANA. That would be a really good thing that we could use immediately. For example, if you have 100 tables in SQL Server or Oracle, then you could just point it to the schema or the 100 tables and ingestion information. However, you can't do that in SAP HANA since StreamSets currently is lacking in this. They do not have a multi-table feature for SAP HANA. Therefore, a multi-table origin for SAP HANA would be helpful.
Buyer's Guide
StreamSets
October 2024
Learn what your peers think about StreamSets. Get advice and tips from experienced pros sharing their opinions. Updated: October 2024.
814,649 professionals have used our research since 2012.
For how long have I used the solution?
I have been using it for the past 12 months.
What do I think about the stability of the solution?
I have no concerns in terms of the application's core stability. We haven't had any major outages as such, and even if we had one, those were internal and related to our network, proxy, or firewall. As someone who implemented it and has been working on it day in, day out, sometimes 24/7, I am quite confident with the stability of the solution.
As with any application, it requires periodical maintenance, at least to do an upgrade. That maintenance is to simply upgrade the product, and nothing more than that.
What do I think about the scalability of the solution?
A core feature of the DataOps Platform is you can easily scale through engines when you have more pipelines running and data to process. So, if you would need to purchase more engines or cores, it is quite scalable. That is a major advantage that we are getting.
In the Control Hub Platform, the orchestration and load balancing are quite scalable. You don't need to fiddle with the existing solution. Everything is run on another engine that gets hooked up automatically to Control Hub, which makes it seamless.
There is sort of a developed template out of StreamSets, where you just have one template and can point it to any source system. You can just start ingesting, which has reduced a lot of time in building our new pipelines.
How are customer service and support?
They are quite good and responsive. We have a dedicated support portal for StreamSets. We have authorized members who can raise support tickets using the portal, including myself. They have a quick turnaround with good responses, so we are quite happy as of now. I would rate the technical support between 7.5 and 8 out of 10.
How would you rate customer service and support?
Positive
Which solution did I use previously and why did I switch?
We previously developed our own custom platform. We switched because maintaining a custom platform is difficult. We are not a product team. We are an energy company who services business customers. Therefore, maintaining a custom platform is difficult. Another thing was that the custom platform was written programmatically. So, you need a lot of people who have a programmatic knowledge, both to maintain and use it.
The time to value is quite a critical KPI. Before, when our business needed data quickly on the platform, our previous solutions struggled to get it. Thus, our time to value has improved a lot and our customers are happy because they are able to get the data quickly.
How was the initial setup?
I was there right from the start when they adopted an open-source version. Late last year, we moved to an enterprise version, i.e., the DataOps platform. So, I worked on the 3.2.2 version, and now I am working on the 5.0 version, which is the enterprise license version.
The implementation is straightforward, except for a few hiccups with known network, process, and firewall issues. Other than that, it was a very simple, lean implementation.
Because we had a lot of firewall issues and issues with our optimization, it took probably four weeks for us to get things running. However, if you exclude the issues, it took probably a week to a week and a half to get things up and running.
We are working, as a separate piece of the project, to migrate whatever is running in our existing custom platform to StreamSets. From a certain date, we started to work purely on StreamSets. For any future ingestion requirements, we are using StreamSets DataOps platform. However, the previous platform is inactive at the moment. We are only using it for existing pipelines, and the plan is to migrate them to the DataOps platform this year very soon.
What about the implementation team?
Two people were needed for the deployment of this solution: a cloud engineer and a senior data engineer.
What was our ROI?
First, it has saved us a lot of time because we do not need to come up with our own custom platform, which is a huge expenditure in building and maintaining the custom platform. Second, even if we go for other products in the market, there are lots of gaps with the other products. Even if we picked up another product, we would have to customize it. An off-the-shelf product is not enough to meet our needs. Therefore, StreamSets has definitely helped us in getting the information into our data lake very quickly, in terms of ingestion.
The most important thing is it has helped us from a resourcing point of view. You can easily upskill a BI or ETL resource without any programming knowledge to work with this. That is a major advantage that we are getting since we have a lot of ETL people who do not have programming knowledge. They have vast ETL experience working with GUI-based tools, and StreamSets is really useful for them.
It has drastically reduced the time that we are spending on workloads by 60% to 70% as well as reducing the time spent on ingestion by 30%.
What's my experience with pricing, setup cost, and licensing?
It has a CPU core-based licensing, which works for us and is quite good.
Which other solutions did I evaluate?
We did evaluate other solutions. It was not a quick decision for us to take this product. We evaluated other products in the market, but they were not close to StreamSets or not in the data integration space. One thing that caught our attention with StreamSet was the processes that it could work with. Secondly, the Control Hub DataOps platform manages the load balancing, etc. We were quite interested in that since we would not need to maintain it ourselves. The third most important thing was that you can create job templates in StreamSets. So, this means you create a template for a particular type of ingestion. Going forward, you just change the parameters, then you can point it to any source. This means there is less pipeline development and we can quickly ingest data into the data lake. Those are the features that we were interested in and why we switched StreamSets.
There is actually a gap in the entire data integration market at the moment, and StreamSets Data Collector is trying to fill that gap. The reason is because most data ingestion has to be done through programming languages, like Python or Java. We currently do not have a GUI-based tool set that is as robust as StreamSets. That is what I found out in the lab over the last 12 months. There are new products coming up, but it will still be a few more years until they are stabilized. Whereas, StreamSets is already there to solve your immediate data ingestion requirements.
What other advice do I have?
Every tool in the market at the moment has some major gaps, especially for large enterprises. It could be the way that the data or pipeline is secured. At present, StreamSets looks like the market leader and is trying to fill that gap. For anyone going through a proof of concept for various tools, StreamSets is almost at the top. I don't think that they need to look any further.
We are working only with API, a relational database management system, and our enterprise warehouses at the moment. We are not using any streaming sort of ingestion at the moment.
We are not using Snowflake Transformer yet. It just got released. We are using a traditional Snowflake destination stage because our enterprise is huge. We have our own Snowflake architecture. We load the security in the data into our own databases using the destination stage, not Transformer yet.
I would rate the solution as 7.5 out of 10.
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
Director Data Engineering, Governance, Operation and Analytics Platform at a financial services firm with 10,001+ employees
Ease of configuring and managing pipelines centrally
Pros and Cons
- "I really appreciate the numerous ready connectors available on both the source and target sides, the support for various media file formats, and the ease of configuring and managing pipelines centrally."
- "StreamSets should provide a mechanism to be able to perform data quality assessment when the data is being moved from one source to the target."
What is our primary use case?
We are using StreamSets to migrate our on-premise data to the cloud.
What is most valuable?
I really appreciate the numerous ready connectors available on both the source and target sides, the support for various media file formats, and the ease of configuring and managing pipelines centrally. It's like a plug-and-play setup.
What needs improvement?
StreamSets should provide a mechanism to be able to perform data quality assessment when the data is being moved from one source to the target. So the ability to validate the data against various data rules. Then, based on the failure of data quality assessment, be able to send alerts or information to help people understand the data validation issues.
For how long have I used the solution?
I have been using StreamSets for a year and a half.
What do I think about the stability of the solution?
It's reasonably stable.
What do I think about the scalability of the solution?
It's reasonably easy to scale. Around 25 to 30 end users are using this solution in our organization.
How are customer service and support?
Customer service and support are good.
How would you rate customer service and support?
Positive
How was the initial setup?
It's reasonably easy to deploy. However, since it is used at an enterprise level, it requires maintenance. So we had a maintenance contract.
In the financial industry, we have very strict regulations around deploying something in the cloud. So, it requires a lot of permission and other processes.
Just one person is enough for the maintenance.
What's my experience with pricing, setup cost, and licensing?
The pricing was reasonably economical and easy for us to afford when we engaged with StreamSets. It was not part of Software AG at that time.
What other advice do I have?
It's a very good tool. Overall, I would rate the solution an eight out of ten.
Which deployment model are you using for this solution?
Hybrid Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
StreamSets
October 2024
Learn what your peers think about StreamSets. Get advice and tips from experienced pros sharing their opinions. Updated: October 2024.
814,649 professionals have used our research since 2012.
Product Marketer at a media company with 1,001-5,000 employees
We have been able to eliminate the vast majority of our break/fix costs and maintenance time
Pros and Cons
- "The entire user interface is very simple and the simplicity of creating pipelines is something that I like very much about it. The design experience is very smooth."
- "One area for improvement could be the cloud storage server speed, as we have faced some latency issues here and there."
What is our primary use case?
Our major use case with StreamSets is to build data pipelines from multiple sources to multiple destinations. We mainly use the StreamSets Data Collector Engine for seamless streaming from any source to any destination.
We also use it to deliver continuous data for database operations and modern analytics.
How has it helped my organization?
One great thing is that now, with the implementation of StreamSets, we have been able to eliminate about 80 percent of our break/fix costs and maintenance time. It is very easy to connect with streaming platforms and streaming services.
Also, we can integrate and stream databases by connecting with multiple streaming services. Before StreamSets, data transfer from source to destination took about three hours of time and it was prone to errors. Now, with the introduction of StreamSets, we primarily use the Data Collector and this has enabled us to complete the same job in less than 30 minutes. We save that much time per day or about 15 hours per week.
Another definite benefit is that it has helped us to break down data silos within our organization. We are able to work together, with the interaction of StreamSets. Previously, the data silos were extremely perilous because data would come from multiple, scattered sources. We were not able to consolidate it on time and we were not able to exactly pinpoint errors. But StreamSets has helped us streamline the use of multiple sources and destinations, completely eliminating the silos. That saves us a lot of time and we have reduced the number of errors by a lot.
What is most valuable?
The most valuable features of StreamSets, for me, are the Data Collector and the Control Hub platform. They are both very straightforward to use and user-friendly. And with the Data Collector and Control Hub, we get canvas selection for designing all our pipelines, which is very intuitive and useful for us.
In fact, the entire user interface is very simple and the simplicity of creating pipelines is something that I like very much about it. The design experience is very smooth. A great thing about StreamSets is that it is a single, centralized platform. All our design-pattern requirements are met with a single design experience through StreamSets.
We can also easily build pipelines with minimal coding and minimal technical knowledge. It is very easy to start and very easy to scale as well. That is very important to me, personally, because I'm from a non-technical background. One of the most important criteria was for me to be able to use this platform efficiently.
Also, moving data to modern analytics platforms is very straightforward. That is why StreamSets is one of the top players in the market right now.
And one of the major advantages for us is the built-in functionality. StreamSets has a plethora of features that combine well with ETL.
What needs improvement?
In terms of features, I don't have any complaints so far. But one area for improvement could be the cloud storage server speed, as we have faced some latency issues here and there.
For how long have I used the solution?
I have been using StreamSets for about eight months.
What do I think about the stability of the solution?
It is stable. It's a cloud-based solution, so there is a little bit of latency, some server speed issues, but apart from that, there is no question about the stability of the solution.
What do I think about the scalability of the solution?
The platform is definitely scalable.
Maybe in the future we will increase our usage of StreamSets, but I don't see any immediate scalability requirements for us.
How are customer service and support?
I have not contacted their customer support, but my team contacts them. From what I understand they have a pretty healthy conversation with the StreamSets customer support. All of our queries are sent via email and they get them sorted out. They also join Google Meet sessions or calls, if required, to sort out our queries. It has been a very smooth journey so far. I don't have any complaints with regard to their customer service.
How would you rate customer service and support?
Positive
Which solution did I use previously and why did I switch?
StreamSets is the first solution that we are using in this space.
How was the initial setup?
I was not fully involved in the initial implementation, but we did the implementation in phases. We wanted to get it on board as soon as possible, so instead of doing a complete implementation, we did it in phases and it didn't take a lot of time. We were able to get on with the work as soon as possible with this model.
The initial setup was simple. We didn't require any additional training or third-party vendors. We were able to do it along with the StreamSets team, so it was smooth for us.
We have 15 people using StreamSets, all at one location. They are developers and users.
Because it is a cloud platform there isn't much maintenance required other than server updates, but that is expected with any cloud platform. No extensive maintenance is required. We have a team of two people who maintain it and handle updates and all the latest releases.
What was our ROI?
Tasks that took three hours can now be done in less than 30 minutes. This is one of the prime data points in terms of ROI for this product.
In terms of money saved, we still haven't seen any direct results from StreamSets. With its automation, we are able to focus on other tasks because StreamSets is taking care of the operations side. Theoretically, it should save us some money but it hasn't until now. We still have the same number of employees.
We are moving in a positive direction. Hopefully, this trend continues. We were able to see the time savings and reduced errors within three months of deployment.
What's my experience with pricing, setup cost, and licensing?
There are two editions, Professional and Enterprise, and there is a free trial. We're using the Professional edition and it is competitively priced. I wouldn't say it's cheap or moderate, but it's also not a high price.
What other advice do I have?
We have been experimenting with Hadoop, but apart from that, we do not use it to establish a connection with other services. As an organization, we have not faced any issues with connectivity using StreamSets. The platform is very stable.
Overall, StreamSets is very efficient and effective. It has helped us save a lot of time and also reduced errors a lot. I would definitely rate it very highly. The major reason is that it gives us a single, centralized platform for all our design-pattern requirements and we are able to produce results efficiently. With StreamSets, we are able to transfer or stream data from any source to any destination. It has increased the overall efficiency of our organization.
Software AG is constantly improving and evolving the product, and that is something that I like: using a product that is ever-evolving and being upgraded.
After deploying StreamSets, I learned a lot about how data planning works and how easy it is to stream from multiple sources to multiple destinations. That is one of my major takeaways. I thought it would be a very complex task, but that myth was broken by StreamSets. The complexity was made very simple for me.
My advice is to try the free edition. It's a very user-friendly and intuitive product as well. Try it to get a grasp of what's happening inside the product. Once you try the free edition, you'll definitely go for the Professional edition. I don't have any doubt about that. The product itself will lure you. That is the power of the product.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
Senior Network Administrator at a energy/utilities company with 201-500 employees
Helped us break down data silos and produce better, up-to-date reports, as well as save money
Pros and Cons
- "The most valuable feature is the pipelines because they enable us to pull in and push out data from different sources and to manipulate and clean things up within them."
- "The design experience is the bane of our existence because their documentation is not the best. Even when they update their software, they don't publish the best information on how to update and change your pipeline configuration to make it conform to current best practices. We don't pay for the added support. We use the "freeware version." The user community, as well as the documentation they provide for the standard user, are difficult, at best."
What is our primary use case?
We use the whole Data Collector application.
How has it helped my organization?
We now consume many more hundreds of terabytes of data than we used to before we had StreamSets. It has definitely enabled us to do things a lot faster, and be a lot more agile, with a lot more data consumption and a lot more reporting.
Another benefit is that it has helped us to break down data silos. We now consume data across different silos and then we aggregate it together so that we can do reporting that is not just for that one silo of people but for a number of different people across the entire organization. That has had a positive effect, enabling us to save money, spend money more effectively, and have more up-to-date data in reports, as well as in auditing. Our safety processes are better too.
One way we have saved money is thanks to how the solution streamlines the data that we pull in, data that we weren't pulling in before.
StreamSets allows more people to know what's going on. It helps us with better allocation of resources, better allocation of staff, and right-sizing. We're in oil and gas and, in our case, it allows us to optimize what we're pulling out of the ground and then what we're selling.
It has helped to scale our data operations and as a result, in addition to saving money and right-sizing, it's helped our field operations and provided us with more management reporting.
Also, the data drift resilience reduces the time it takes to fix data drift breakages.
What is most valuable?
The most valuable feature is the pipelines because they enable us to pull in and push out data from different sources and to manipulate and clean things up within them.
We use StreamSets to connect to enterprise data stores, including OLTP databases and Hadoop. Connecting to them is pretty easy. It's the data manipulation and the data streaming that are the harder parts behind that, just because of the way the tool is written.
What needs improvement?
The design experience is the bane of our existence because their documentation is not the best. Even when they update their software, they don't publish the best information on how to update and change your pipeline configuration to make it conform to current best practices.
We don't pay for the added support. We use the "freeware version." The user community, as well as the documentation they provide for the standard user, are difficult, at best.
However, we have a couple of people in-house here who are experts in data analysis and they have figured out how to use this tool. We have to have people who are extremely skilled to go in and write the pipelines for this software because it's so complicated. The software works great for us, but there is an extremely steep learning curve because they don't provide a lot of information outside of paying their ridiculous support costs. Their support starts at $50,000 a year and up.
Also, the built-in data drift resilience for ETL operations requires a bunch of custom code development to be able to handle that. It's somewhat difficult because you have to customize it a fair amount.
I also would like a more user-friendly interface and better error-trap handling.
For how long have I used the solution?
We have been using StreamSets for about four years.
What do I think about the stability of the solution?
We just patched ourselves up to the latest release about a month ago, so it's actually pretty stable at this point. It used to be quite buggy, going back over the last little while, but it's pretty stable now.
What do I think about the scalability of the solution?
This software is very scalable.
Which solution did I use previously and why did I switch?
We did not have a previous solution.
How was the initial setup?
The initial setup was somewhere between straightforward and complex. It was pretty straightforward to start with, but then it started ramping up to be more difficult as we wanted to add more stuff in.
The difficulty depends upon your data sources. If you have just one data source and you want to consume a lot of different types of data from that one source, it's pretty straightforward. But when you have 20 or 25 different data sources, and you need to pipeline all that data into a couple of data warehouses so that you can use advanced data analytics software to do reporting, analysis, and notifications, it's a lot more complicated. With every data source, it becomes exponentially more complicated to manage.
We spent a significant amount of time doing it, but otherwise, it was seamless because it was our own staff. We didn't have to worry about trying to find money or resource time or do any of the prep work needed to get external resources.
Ours is a single deployment, but it is used across our entire staff base of 200-plus people. We need three people for deployment and maintenance, whose responsibilities include software management, application management, and data analysis and management.
What was our ROI?
The ROI we have seen is in savings of time and money.
What's my experience with pricing, setup cost, and licensing?
We use the free version. It's great for a public, free release. Our stance is that the paid support model is too expensive to get into. They should honestly reevaluate that.
We tried to go and get them to look at their licensing and support model and they said they were not interested in reevaluating that in any way.
Which other solutions did I evaluate?
We tried to use another freeware ETL tool. It's fairly well-known. We ran it for a couple of months but it was going to be even more difficult than StreamSets, so we chose that in the end.
What other advice do I have?
The ease of using StreamSet to move data into modern analytics platforms, on a scale of one to 10, is about a five.
The solution enables you to build data pipelines without knowing how to code if it's the latest, state-of-the-art cloud connecting stuff. If it's for anything structured for Oracle and SQL Server and other data sources, it's difficult. Without knowing how to write code, some of it's easy and some of it is not.
My advice to someone who is considering this software is to be very aware that their integrator and data analysis people will need a very specific skill set.
Which deployment model are you using for this solution?
On-premises
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
Software Developer at Appnomu Business Services
Simplifies the way we perform tasks and engineer pipelines at all stages
Pros and Cons
- "StreamSets Transformer is a good feature because it helps you when you are developing applications and when you don't want to write a lot of code. That is the best feature overall."
- "The monitoring visualization is not that user-friendly. It should include other features to visualize things, like how many records were streamed from a source to a destination on a particular date."
What is our primary use case?
It is primarily being used by our IT department to configure things and see what is missing and what the issues are.
How has it helped my organization?
I'm using StreamSets to find issues with our software and it is helping us to do so, and to make sure that we are able to debug on time. It makes things much simpler. We can use the solution to know what issue is happening at the moment. We are able to easily identify a leak and resolve it on time.
It reduces our workload by about 30 percent. And it saves us a lot on having to hire expensive technical experts or software engineers. You purchase a package with a reasonable pricing model, and then you can use it with your team. It saves us from hiring a technical person to carry out the tasks. With StreamSets, you can do a task easily.
It also makes it easy to send data from one place to another.
StreamSets is doing a lot in our IT operations because it is simplifying the way we perform tasks and the way we engineer pipelines at all stages, including the sources, processes, and destination use. We can schedule data pipelines and that's easy.
And because it is low-code software, you don't need to develop the code and that really saves a lot of time. Using the canvas to create and engineer data pipelines is very easy. StreamSets saves me three hours that it would take me to manually do a task.
What is most valuable?
StreamSets Transformer is a good feature because it helps you when you are developing applications and when you don't want to write a lot of code. That is the best feature overall. They really help you to come up with a solution more quickly. The Transformer logic is very easy, as long as you understand the concept of what you intend to develop. It doesn't require any technical skills.
The overall GUI and user interface are also good because you don't need to write complex programming for any implementation. You just drag and configure what you want to implement. It's very easy and you can use it without knowing any programming language.
The design experience is much easier when you want to integrate other systems and tools and make them work in a particular format. It helps you improve the topologies. You can view the status of all the pipelines you have developed and monitor them.
Connecting to enterprise data stores is also very easy, as is monitoring and managing things in one place.
What needs improvement?
The monitoring visualization is not that user-friendly. It should include other features to visualize things, like how many records were streamed from a source to a destination on a particular date.
I would also like better, detailed logging of error information.
It also needs a fragment drill-down feature when monitoring a data flow. That needs a lot of improvement, especially when you are running a job.
For how long have I used the solution?
This is my second year using StreamSets.
What do I think about the stability of the solution?
It's stable.
What do I think about the scalability of the solution?
It is a scalable solution for any company that needs to know about its data processing.
How are customer service and support?
It is hard to get technical support from the company. To receive one-on-one communication requires a budget, which we don't really have. The way we get technical support is through the documentation and knowledge base.
It is missing a live instant chat on the dashboard.
How would you rate customer service and support?
Neutral
Which solution did I use previously and why did I switch?
We did not have a previous solution.
How was the initial setup?
Initially, the deployment could be very hard if you do not have a lot of technical skills, but as you get used to the software, within a day, the deployment becomes straightforward and becomes easy. It took two weeks to have everything configured in the right manner. I worked with one other colleague to set everything up.
It is hard, especially when you are a beginner, but when you read the documentation you can set things up quickly. The documentation helps out if you don't have good knowledge of the solution.
It doesn't require maintenance.
What was our ROI?
The solution is helping a lot because we are not spending a lot of money on a technical team. We just subscribe to the software and we're able to configure things. It has helped us save on resources by 30 percent.
What's my experience with pricing, setup cost, and licensing?
The pricing is too fixed. It should be based on how much data you need to process. Some businesses are not so big that they process a lot of data. They process a lot of debugging. The pricing is not so favorable for a small enterprise because it is too limited.
What other advice do I have?
I would recommend the software to any business that needs to do data engineering. If they design data pipelines, it is really a great idea to test out StreamSets. Unfortunately, you need a good budget for it. If a small business doesn't have the budget, I cannot recommend it. But if they have a good budget, I really recommend it because it has so many features that can really help data scientists and analysts generate patterns or insights for their businesses. And it will benefit their customers as well.
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
Data Engineer at a consultancy with 11-50 employees
Effective, and helps scale data operations, but sometimes the support's response is slow
Pros and Cons
- "In StreamSets, everything is in one place."
- "If you use JDBC Lookup, for example, it generally takes a long time to process data."
What is our primary use case?
The project which I work on is developed in StreamSets and I lead the team. I'm the team leader and the Solution Architect. I also train my juniors and my team.
For the last year and a half, I’ve been using this tool and this tool is very effective for data processing from source to destination. This tool is very effective and I developed many integrations in this tool.
How has it helped my organization?
The solution is really effective.
What is most valuable?
It's very effective in project delivery. This month, at the end of June, I will deploy all the integrations which I developed in StreamSets to production remit. The business users and customers are happy with the data flow optimizer from the SoPlex cloud. It all looks good.
Not many challenges are there in terms of learning new technologies and using new tools. We will try and do an R&D analysis more often.
Everything is in place and it comes as a package. They install everything. The package includes Python, Ruby, and others. I just need to configure the correct details in the pipeline and go ahead with my work.
The ease of the design experience when implementing batch streaming and ETL pipelines is very good. The streaming is very good. Currently, I'm using Data Collector and it’s effective. If I'm going to use less streaming, like in Java core, I need to follow up on different processes to deploy the core and connect the database. There are not so many cores that I need to write.
In StreamSets, everything is in one place. If you want to connect a database and configure it, it is easy. If you want to connect to HTTP, it’s simple. If I'm going to do the same with my other tools, I don’t need many configurations or installations. StreamSets' ability to connect enterprise data stores such as OLTP databases and Hadoop, or messaging systems such as Kafka is good. I also send data to both the database and Kafka as well.
You will get all the drives that you will need to install with the database. If you use other databases, you're going to need a JDBC, which is not difficult to use.
I'm sending data to different CDP URL databases, cloud areas, and Azure areas.
StreamSets' built-in data drift resilience plays a part in our ETL operations. We have some processors in StreamSets, and it will tell us what data has been changed and how data needs to be sent.
It's an easy tool. If you're going to use it as a customer, then it should take a long time to process data. I'm not sure if in the future, it will take some time to process the billions of records that I'm working on. We generally process billions of records on a daily basis. I will need to see when I work on this new project with Snowflake. We might need to process billions of records, and that will happen from the source. We’ll see how long it needs to take and how this system is handling it. At that point, I’ll be able to say how effectivly StreamSets processes it.
The Data Collector saves time. However, there are some issues with the DPL.
StreamSets helped us break down data silos within our organizations.
One advantage is that everything happens in one place, if you want to develop or create something, you can get those details from StreamSets. The portal, however, takes time. However, they are focusing on this.
StreamSets' reusable assets have helped to reduce workload by 32% to 40%.
StreamSets helped us to scale our data operations.
If you get a request to process data for other processing tools, it might take a long time, like two to three hours. With this, I can do it within half an hour, 20 or 30 minutes. It’s more effective. I have everything in one place and I can configure everything. It saves me time as it is so central.
What needs improvement?
If you use JDBC Lookup, for example, it generally takes a long time to process data.
StreamSets enables us to build data pipelines without knowing how to code. You can do it, however, you need to know data flow. Without knowing anything, it's a bit difficult for new people. You need some technical skills if you are to create a data pipeline. When procuring the data pipeline, for example, you need the original processor and destination. If you don't know where you're going to read the data, where to send the data, and if you have to send the data, you have to configure it. If the destination you're looking for is some particular message permit or data permit, then you should write your own code there. You need some knowledge of coding as StreamSets does not provide any coding.
StreamSets data drift resilience has not exactly reduced the time it takes for us to fix data drift breakages. A lot of improvements are required from StreamSets. I'm not sure how they're planning to make it happen. There are some issues in the case of data processing, and other scenarios.
If the data processing in StreamSets takes a long time as compared to the previous solution, then we will reconsider why we use StreamSets.
For how long have I used the solution?
I've been using this StreamSets for the last two years.
What do I think about the stability of the solution?
In terms of stability, there have been one or two issues. Good people work on the solutions when we have issues. However, sometimes we don't get a good solution.
As a user, I expect a lot more and that the solution will come quicker as compared to keeping projects on hold or keeping them for a long time. If they do not have any solution, then we can plan accordingly how to use the other processors. They just need to let us know quickly.
What do I think about the scalability of the solution?
The scalability is good.
We do plan to increase usage.
How are customer service and support?
In terms of technical support, they generally do a detailed analysis from their end. They always try to give a proper solution. However, sometimes, they won't get to any proper solution. They'll come back and look into it and sometimes it takes time. If they can speed up the process a little bit that would be ideal. We are always sitting on the edge. If we don't get a proper response from them, then it will be very difficult for us to answer to higher management.
How would you rate customer service and support?
Neutral
Which solution did I use previously and why did I switch?
This is my first solution of this kind. Previously, I was working in open source systems, with scripting, et cetera. This is the first time I've worked in the data area. I've got full support. As a new data user, I'm still getting used to it.
How was the initial setup?
The setup is straightforward, it's not complex and it is simple.
We treat it like a pipeline. We are not writing code and putting things in. In the case of a pipeline, you can export it and input it, or you can make it a pipeline. It can be auto-deployed into a respective environment. That's what we did.
We have different destinations we need to send to. We aren't using a single destination. In that sense, we do have multiple computations. We set up, send the data and do the deployments.
There is occasional maintenance needed. Sometimes, if something goes wrong, we'll have to correct the data. We just check here and there for the most part.
What about the implementation team?
We did not need an integrator or consultant to assist with the setup.
As a team, we do the deployment. We won't send it to others, whatever we develop, we will test and deploy. We already have the system in place and it is really helpful for the deployment of the solution.
What was our ROI?
I haven't seen an ROI.
It's not exactly saving us money as it's a new tool. If I'm going to hire someone new, I will not hire based on the StreamSets tool or some specific tools, and I might save money right away. However, I'm spending time on my side. StreamSets is not being used by many horizons. In some places in Europe, fewer companies are using StreamSets. People should get to know StreamSets and they should get some expertise in the area, the way AWS and Azure do. I’m spending a lot more time and therefore I’m not saving money. That said, I’m also not losing money.
What's my experience with pricing, setup cost, and licensing?
Higher management handled the licensing. However, I can't say how much it costs. I'm more on the user side.
Which other solutions did I evaluate?
I did not evaluate other options.
What other advice do I have?
I have not yet used StreamSets' Transformer for Snowflake functionality. I created one POC, not with Snowflake, however, I'm going to use Snowflake in my next project.
I'd rate the solution seven out of ten. They are doing a good job. Using this solution I can feel the data and see the user flows.
If you are going to withdraw on-premise, and you're just copying the data to a table, you're not going to see how much data has been copied. With this, I'm seeing how much data has been transferred, and where the processor is. It gives a clear picture with metric details and notifications. That's the reason I used this tool for the last two years.
Which deployment model are you using for this solution?
Hybrid Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
Senior Technical Manager at a financial services firm with 501-1,000 employees
The ease of configuration for pipes is amazing, and the GUI is very nice
Pros and Cons
- "The Ease of configuration for pipes is amazing. It has a lot of connectors. Mainly, we can do everything with the data in the pipe. I really like the graphical interface too"
- "I would like to see it integrate with other kinds of platforms, other than Java. We're going to have a lot of applications using .NET and other languages or frameworks. StreamSets is very helpful for the old Java platform but it's hard to integrate with the other platforms and frameworks."
- "StreamSet works great for batch processing but we are looking for something that is more real-time. We need latency in numbers below milliseconds."
What is our primary use case?
It performs very well. The main use is to extract information from some of our Kafka topics and put it in our internal systems, flat files, and integration with Java.
How has it helped my organization?
It facilitates the consumption of the data in batch mode to the system where it is required. We don't do a lot of transformations or joining or forking of the information. It's more point-to-point connectivity that we implement over StreamSets.
What is most valuable?
The ease of configuration for pipes is amazing. It has a lot of connectors. Mainly, we can do everything with the data in the pipe. I really like the graphical interface too. It's pretty nice.
What needs improvement?
I would like to see it integrate with other kinds of platforms, other than Java. We're going to have a lot of applications using .NET and other languages or frameworks. StreamSets is very helpful for the old Java platform but it's hard to integrate with the other platforms and frameworks.
StreamSets works great for batch processing but we are looking for something that is more real-time. We need latency in numbers below milliseconds.
For how long have I used the solution?
One to three years.
What do I think about the stability of the solution?
It's pretty stable. StreamSets has been up and running up for months without any intervention in terms of the operations team. It's great.
I don't know if they can implement some kind of high-availability. I really don't go deep into that kind of configuration because, with only one node and running as stably as it is, we have no problem with that. But for critical operations, I'd like to know if I can facilitate some kind of high-availability, in case one of the nodes go down.
What do I think about the scalability of the solution?
It's pretty scalable.
How is customer service and technical support?
I don't use support. I mainly use the community or web searches; self-learning.
How was the initial setup?
The initial setup is pretty straightforward.
What other advice do I have?
If you are looking for something to do batch processing in Java, this is the right solution. We did the exploration when we were trying to implement a batch processing system and decided that StreamSets is the best for that. If you're looking for real-time, you may want to look at another system or the next version of this one.
Because of the kind of system that we need to implement with this kind of solution, the most important factors I look at when selecting a vendor are things like latency and real-time processing.
I would rate it at nine out of 10. What would make it a 10 would be, as I said, I'd like to have more integration with other kinds of languages or frameworks and also more real-time processing, not batch.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Data Engineer at a energy/utilities company with 10,001+ employees
Easy to set up and use, and the functionality for transforming data is good
Pros and Cons
- "It is really easy to set up and the interface is easy to use."
- "We've seen a couple of cases where it appears to have a memory leak or a similar problem."
What is our primary use case?
We typically use it to transport our Oracle raw datasets up to Microsoft Azure, and then into SQL databases there.
What is most valuable?
It is really easy to set up and the interface is easy to use.
We found it pretty easy to transform data.
The online documentation is pretty good.
What needs improvement?
We've seen a couple of cases where it appears to have a memory leak or a similar problem. It grows for a bit and then we'd have to restart the container, maybe once a month when it gets high.
For how long have I used the solution?
We have been using StreamSets for about one year. We may have been experimenting with it slightly before that time.
What do I think about the stability of the solution?
Other than the memory issue that we occasionally see, the stability has been really good.
What do I think about the scalability of the solution?
We haven't seen a problem with scaling it.
How are customer service and technical support?
I haven't had to deal with technical support. We would first check the online documentation or web documentation, and usually found what we needed. We haven't had to call them.
Which solution did I use previously and why did I switch?
Prior to using StreamSets, we were using Microsoft CDC (Change Data Capture). It was a fairly old product and there were lots of workaround and lots of issues that we had with it. We were looking for something more user-friendly. It was pretty stable, so that was not an issue.
How was the initial setup?
This product was a lot easier to use than the one we had before it. It took us half an hour and we were set up and running it, the first time.
What's my experience with pricing, setup cost, and licensing?
We are running the community version right now, which can be used free of charge. We were debating whether to move it to the commercial version, but we haven't had the need to, just yet.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Download our free StreamSets Report and get advice and tips from experienced pros
sharing their opinions.
Updated: October 2024
Product Categories
Data IntegrationPopular Comparisons
Informatica Intelligent Data Management Cloud (IDMC)
Azure Data Factory
Informatica PowerCenter
AWS Glue
MuleSoft Anypoint Platform
Oracle Data Integrator (ODI)
webMethods.io
Talend Open Studio
Confluent
IBM InfoSphere DataStage
AWS Database Migration Service
Oracle GoldenGate
SAP Data Services
Qlik Replicate
Buyer's Guide
Download our free StreamSets Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:
- How does Matillion ETL compare to StreamSets?
- When evaluating Data Integration, what aspect do you think is the most important to look for?
- Microsoft SSIS vs. Informatica PowerCenter - which solution has better features?
- What are the best on-prem ETL tools?
- Which integration solution is best for a company that wants to integrate systems between sales, marketing, and project development operations systems?
- Experiences with Oracle GoldenGate vs. Oracle Data Integrator?
- What are the must-have features for a Data integration system?
- Should we choose Data Hub or GoldenGate?
- Is there a bulletproof KPI Data Manager for SME?
- A recent review wrote that PowerCenter has room for improvement. Agree or Disagree?