Senior Data Platform Manager at a manufacturing company with 10,001+ employees
Real User
Top 5
2024-04-10T16:56:24Z
Apr 10, 2024
StreamSets is used for data transformation rather than ETL processes. It focuses on transforming data directly from sources without handling the extraction part of the process. The transformed data is loaded into Amazon Redshift or other data warehousing solutions.
We were receiving data from hospitals or any kind of healthcare service providers in the country. We were dominantly operating in the US. When we received that data, we had to classify it into different repositories or different datasets. This data was sent to different vendors, and for that, the data needed to get processed in different ways. We needed to bifurcate data at many steps with different kinds of filters. For that, we used StreamSets.
We are sharing data between platforms. It's helping me to be independent of the ETL tools as well as have the data format without using any programming language.
We use it for building a data lake in our content. We have sales multiple times during the day, and a sale is the trigger. Sales use the lake as a landing zone. We also use it for various types of data transformation.
I use StreamSets to develop data feeds for different balance streams, I use it to control options for scheduling my data plane, and for internal version control.
Chief software engineer at Appnomu Business Services
Real User
Top 10
2023-03-24T12:32:00Z
Mar 24, 2023
In our department, we use StreamSets to design data pipelines that load all data from various RD and VMS sources to the cloud, such as Azure. We also use the data set for data analysts to generate panels for our organization, as well as for real-time use cases for monitoring and consuming other streaming data. Additionally, we are able to customize StreamSets to suit our needs and budget.
The main use case of StreamSets is to work on data integration and ingesting data for DataOps and modern analytics. We also use it for integrating data files from multiple sources. We use it to build, monitor, and manage smart, continuous data pipelines.
My primary use case with StreamSets is to integrate large data sets from multiple sources into a destination. We also use it as a platform to ingest data and deliver data for database analytics.
Product Marketer at a media company with 1,001-5,000 employees
Real User
Top 5
2023-01-06T22:40:00Z
Jan 6, 2023
Our major use case with StreamSets is to build data pipelines from multiple sources to multiple destinations. We mainly use the StreamSets Data Collector Engine for seamless streaming from any source to any destination. We also use it to deliver continuous data for database operations and modern analytics.
We are working on a very large data analytics project, in which we are integrating large data sets to a platform from multiple sources. We need to create data pipelines. We are using StreamSets for all the data integration activities, for creating the pipelines, monitoring them, and running all the data processes smoothly.
I worked mostly on data injection use cases when I was using Data Collector. Later on, I got involved with some Spark-based transformations using Transformer. Currently, we are not using CI/CD. We are not using automated deployments. We are manually deploying in prod, but going forward, we are planning to use CI/CD to have automated deployments. I worked on on-prem and cloud deployments. The current implementation is on-prem, but in my previous project, we worked on AWS-based implementation. We did a small PoC with GCP as well.
StreamSets is a wonderful data engineering, data ops tool where we can design and create data pipelines, loading on-prem data to the cloud. One of our major projects was to move data from on-premises to Azure and GCP Cloud. From there, once data is loaded, the data scientist and data analyst teams use that data to generate patterns and insights. For a US healthcare service provider company, we designed a StreamSets pipeline to connect to relational database sources. We did generate schema from the source data loaded into Azure Data Lake Storage (ADLS) or any cloud, like S3 or GCP. This was one of our batch use cases. With StreamSets, we have also tried to solve our real-time streaming use cases as well, where we were streaming data from source Kafka topic to Azure Event Hubs. This was a trigger-based streaming pipeline, which moved data when it appeared in a Kafka topic. Since this pipeline was a streaming pipeline, it was continuously streaming data from Kafka to Azure for further analysis.
Senior Technical Manager at a financial services firm with 501-1,000 employees
Real User
2018-08-08T07:09:00Z
Aug 8, 2018
It performs very well. The main use is to extract information from some of our Kafka topics and put it in our internal systems, flat files, and integration with Java.
StreamSets is a data integration platform that enables organizations to efficiently move and process data across various systems. It offers a user-friendly interface for designing, deploying, and managing data pipelines, allowing users to easily connect to various data sources and destinations. StreamSets also provides real-time monitoring and alerting capabilities, ensuring that data is flowing smoothly and any issues are quickly addressed.
StreamSets is used for data transformation rather than ETL processes. It focuses on transforming data directly from sources without handling the extraction part of the process. The transformed data is loaded into Amazon Redshift or other data warehousing solutions.
We are using StreamSets to migrate our on-premise data to the cloud.
We were receiving data from hospitals or any kind of healthcare service providers in the country. We were dominantly operating in the US. When we received that data, we had to classify it into different repositories or different datasets. This data was sent to different vendors, and for that, the data needed to get processed in different ways. We needed to bifurcate data at many steps with different kinds of filters. For that, we used StreamSets.
We are sharing data between platforms. It's helping me to be independent of the ETL tools as well as have the data format without using any programming language.
We use it for building a data lake in our content. We have sales multiple times during the day, and a sale is the trigger. Sales use the lake as a landing zone. We also use it for various types of data transformation.
I use StreamSets to develop data feeds for different balance streams, I use it to control options for scheduling my data plane, and for internal version control.
In our department, we use StreamSets to design data pipelines that load all data from various RD and VMS sources to the cloud, such as Azure. We also use the data set for data analysts to generate panels for our organization, as well as for real-time use cases for monitoring and consuming other streaming data. Additionally, we are able to customize StreamSets to suit our needs and budget.
Our company builds products mainly for healthcare divisions and we use StreamSets for all our data engineering tasks.
The main use case of StreamSets is to work on data integration and ingesting data for DataOps and modern analytics. We also use it for integrating data files from multiple sources. We use it to build, monitor, and manage smart, continuous data pipelines.
My primary use case with StreamSets is to integrate large data sets from multiple sources into a destination. We also use it as a platform to ingest data and deliver data for database analytics.
Our major use case with StreamSets is to build data pipelines from multiple sources to multiple destinations. We mainly use the StreamSets Data Collector Engine for seamless streaming from any source to any destination. We also use it to deliver continuous data for database operations and modern analytics.
We use the whole Data Collector application.
We are working on a very large data analytics project, in which we are integrating large data sets to a platform from multiple sources. We need to create data pipelines. We are using StreamSets for all the data integration activities, for creating the pipelines, monitoring them, and running all the data processes smoothly.
I worked mostly on data injection use cases when I was using Data Collector. Later on, I got involved with some Spark-based transformations using Transformer. Currently, we are not using CI/CD. We are not using automated deployments. We are manually deploying in prod, but going forward, we are planning to use CI/CD to have automated deployments. I worked on on-prem and cloud deployments. The current implementation is on-prem, but in my previous project, we worked on AWS-based implementation. We did a small PoC with GCP as well.
We are using the StreamSets DataOps platform to ingest data to a data lake.
StreamSets is a wonderful data engineering, data ops tool where we can design and create data pipelines, loading on-prem data to the cloud. One of our major projects was to move data from on-premises to Azure and GCP Cloud. From there, once data is loaded, the data scientist and data analyst teams use that data to generate patterns and insights. For a US healthcare service provider company, we designed a StreamSets pipeline to connect to relational database sources. We did generate schema from the source data loaded into Azure Data Lake Storage (ADLS) or any cloud, like S3 or GCP. This was one of our batch use cases. With StreamSets, we have also tried to solve our real-time streaming use cases as well, where we were streaming data from source Kafka topic to Azure Event Hubs. This was a trigger-based streaming pipeline, which moved data when it appeared in a Kafka topic. Since this pipeline was a streaming pipeline, it was continuously streaming data from Kafka to Azure for further analysis.
We typically use it to transport our Oracle raw datasets up to Microsoft Azure, and then into SQL databases there.
It performs very well. The main use is to extract information from some of our Kafka topics and put it in our internal systems, flat files, and integration with Java.