What is our primary use case?
The project which I work on is developed in StreamSets and I lead the team. I'm the team leader and the Solution Architect. I also train my juniors and my team.
For the last year and a half, I’ve been using this tool and this tool is very effective for data processing from source to destination. This tool is very effective and I developed many integrations in this tool.
How has it helped my organization?
The solution is really effective.
What is most valuable?
It's very effective in project delivery. This month, at the end of June, I will deploy all the integrations which I developed in StreamSets to production remit. The business users and customers are happy with the data flow optimizer from the SoPlex cloud. It all looks good.
Not many challenges are there in terms of learning new technologies and using new tools. We will try and do an R&D analysis more often.
Everything is in place and it comes as a package. They install everything. The package includes Python, Ruby, and others. I just need to configure the correct details in the pipeline and go ahead with my work.
The ease of the design experience when implementing batch streaming and ETL pipelines is very good. The streaming is very good. Currently, I'm using Data Collector and it’s effective. If I'm going to use less streaming, like in Java core, I need to follow up on different processes to deploy the core and connect the database. There are not so many cores that I need to write.
In StreamSets, everything is in one place. If you want to connect a database and configure it, it is easy. If you want to connect to HTTP, it’s simple. If I'm going to do the same with my other tools, I don’t need many configurations or installations. StreamSets' ability to connect enterprise data stores such as OLTP databases and Hadoop, or messaging systems such as Kafka is good. I also send data to both the database and Kafka as well.
You will get all the drives that you will need to install with the database. If you use other databases, you're going to need a JDBC, which is not difficult to use.
I'm sending data to different CDP URL databases, cloud areas, and Azure areas.
StreamSets' built-in data drift resilience plays a part in our ETL operations. We have some processors in StreamSets, and it will tell us what data has been changed and how data needs to be sent.
It's an easy tool. If you're going to use it as a customer, then it should take a long time to process data. I'm not sure if in the future, it will take some time to process the billions of records that I'm working on. We generally process billions of records on a daily basis. I will need to see when I work on this new project with Snowflake. We might need to process billions of records, and that will happen from the source. We’ll see how long it needs to take and how this system is handling it. At that point, I’ll be able to say how effectivly StreamSets processes it.
The Data Collector saves time. However, there are some issues with the DPL.
StreamSets helped us break down data silos within our organizations.
One advantage is that everything happens in one place, if you want to develop or create something, you can get those details from StreamSets. The portal, however, takes time. However, they are focusing on this.
StreamSets' reusable assets have helped to reduce workload by 32% to 40%.
StreamSets helped us to scale our data operations.
If you get a request to process data for other processing tools, it might take a long time, like two to three hours. With this, I can do it within half an hour, 20 or 30 minutes. It’s more effective. I have everything in one place and I can configure everything. It saves me time as it is so central.
What needs improvement?
If you use JDBC Lookup, for example, it generally takes a long time to process data.
StreamSets enables us to build data pipelines without knowing how to code. You can do it, however, you need to know data flow. Without knowing anything, it's a bit difficult for new people. You need some technical skills if you are to create a data pipeline. When procuring the data pipeline, for example, you need the original processor and destination. If you don't know where you're going to read the data, where to send the data, and if you have to send the data, you have to configure it. If the destination you're looking for is some particular message permit or data permit, then you should write your own code there. You need some knowledge of coding as StreamSets does not provide any coding.
StreamSets data drift resilience has not exactly reduced the time it takes for us to fix data drift breakages. A lot of improvements are required from StreamSets. I'm not sure how they're planning to make it happen. There are some issues in the case of data processing, and other scenarios.
If the data processing in StreamSets takes a long time as compared to the previous solution, then we will reconsider why we use StreamSets.
For how long have I used the solution?
I've been using this StreamSets for the last two years.
What do I think about the stability of the solution?
In terms of stability, there have been one or two issues. Good people work on the solutions when we have issues. However, sometimes we don't get a good solution.
As a user, I expect a lot more and that the solution will come quicker as compared to keeping projects on hold or keeping them for a long time. If they do not have any solution, then we can plan accordingly how to use the other processors. They just need to let us know quickly.
What do I think about the scalability of the solution?
The scalability is good.
We do plan to increase usage.
How are customer service and support?
In terms of technical support, they generally do a detailed analysis from their end. They always try to give a proper solution. However, sometimes, they won't get to any proper solution. They'll come back and look into it and sometimes it takes time. If they can speed up the process a little bit that would be ideal. We are always sitting on the edge. If we don't get a proper response from them, then it will be very difficult for us to answer to higher management.
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
This is my first solution of this kind. Previously, I was working in open source systems, with scripting, et cetera. This is the first time I've worked in the data area. I've got full support. As a new data user, I'm still getting used to it.
How was the initial setup?
The setup is straightforward, it's not complex and it is simple.
We treat it like a pipeline. We are not writing code and putting things in. In the case of a pipeline, you can export it and input it, or you can make it a pipeline. It can be auto-deployed into a respective environment. That's what we did.
We have different destinations we need to send to. We aren't using a single destination. In that sense, we do have multiple computations. We set up, send the data and do the deployments.
There is occasional maintenance needed. Sometimes, if something goes wrong, we'll have to correct the data. We just check here and there for the most part.
What about the implementation team?
We did not need an integrator or consultant to assist with the setup.
As a team, we do the deployment. We won't send it to others, whatever we develop, we will test and deploy. We already have the system in place and it is really helpful for the deployment of the solution.
What was our ROI?
I haven't seen an ROI.
It's not exactly saving us money as it's a new tool. If I'm going to hire someone new, I will not hire based on the StreamSets tool or some specific tools, and I might save money right away. However, I'm spending time on my side. StreamSets is not being used by many horizons. In some places in Europe, fewer companies are using StreamSets. People should get to know StreamSets and they should get some expertise in the area, the way AWS and Azure do. I’m spending a lot more time and therefore I’m not saving money. That said, I’m also not losing money.
What's my experience with pricing, setup cost, and licensing?
Higher management handled the licensing. However, I can't say how much it costs. I'm more on the user side.
Which other solutions did I evaluate?
I did not evaluate other options.
What other advice do I have?
I have not yet used StreamSets' Transformer for Snowflake functionality. I created one POC, not with Snowflake, however, I'm going to use Snowflake in my next project.
I'd rate the solution seven out of ten. They are doing a good job. Using this solution I can feel the data and see the user flows.
If you are going to withdraw on-premise, and you're just copying the data to a table, you're not going to see how much data has been copied. With this, I'm seeing how much data has been transferred, and where the processor is. It gives a clear picture with metric details and notifications. That's the reason I used this tool for the last two years.
Which deployment model are you using for this solution?
Hybrid Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.