Try our new research platform with insights from 80,000+ expert users

Collibra Catalog vs StreamSets comparison

 

Comparison Buyer's Guide

Executive Summary
 

Categories and Ranking

Collibra Catalog
Average Rating
7.8
Number of Reviews
5
Ranking in other categories
Metadata Management (4th)
StreamSets
Average Rating
8.4
Number of Reviews
24
Ranking in other categories
Data Integration (9th)
 

Mindshare comparison

Collibra Catalog and StreamSets aren’t in the same category and serve different purposes. Collibra Catalog is designed for Metadata Management and holds a mindshare of 11.2%, up 8.6% compared to last year.
StreamSets, on the other hand, focuses on Data Integration, holds 1.7% mindshare, up 1.3% since last year.
Metadata Management
Data Integration
 

Featured Reviews

Aditya Pawar - PeerSpot reviewer
Mar 1, 2024
Effective for data discovery and supports multiple authentication methods, not just username and password
For data discovery, we create datasets. The most frequently used datasets are featured on the dashboard. Users can create their own if their requirements aren't met by the most frequently used datasets. We also create data requests, and owners can approve these requests, which adds a process layer to accessing particular datasets. So, it has been effective for data discovery in our company. I use different functionalities within the tool. I use Collibra Catalog for metadata management or Collibra Lineage for data governance. Once configured, data in the Catalog will be automatically updated, reducing the need for manual maintenance. So, the automation feature has positively impacted our data management tasks.
Reyansh Kumar - PeerSpot reviewer
Mar 10, 2023
We no longer need to hire highly skilled data engineers to create and monitor data pipelines
The things I like about StreamSets are its * overall user interface * efficiency * product features, which are all good. Also, the scheduling within the data engineering pipeline is very much appreciated, and it has a wide range of connectors for connecting to any data sources like SQL Server, AWS, Azure, etc. We have used it with Kafka, Hadoop, and Azure Data Factory Datasets. Connecting to these systems with StreamSets is very easy. You just need to configure the data sources, the paths and their configurations, and you are ready to go. It is very efficient and very easy to use for ETL pipelines. It is a GUI-based interface in which you can easily create or design your own data pipelines with just a few clicks. As for moving data into modern analytics systems, we are using it with Microsoft Power BI, AWS, and some on-premises solutions, and it is very easy to get data from StreamSets into them. No hardcore coding or special technical expertise is required. It is also a no-code platform in which you can configure your data sources and data output for easy configuration of your data pipeline. This is a very important aspect because if a tool requires code development, we need to hire software developers to get the task done. By using StreamSets, it can be done with a few clicks.

Quotes from Members

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Pros

"Collibra Catalog's best feature is the data quality checker."
"Collibra Catalog has significantly enhanced data governance and compliance for our team, primarily through its valuable feature of endpoint lineage enabling visual representation of the data."
"The data lineage capability is valuable as it shows how different sources are connected and how data flows, which is crucial for projects like migrations. Moreover, data lineage visualization in Collibra Catalog aids our data governance initiatives."
"We have had no complaints about the stability."
"Collibra Catalog is simple to use and user-friendly for those who are not technically inclined since it is easy to find while also easy to see data lineage diagrams."
"The most valuable would be the GUI platform that I saw. I first saw it at a special session that StreamSets provided towards the end of the summer. I saw the way you set it up and how you have different processes going on with your data. The design experience seemed to be pretty straightforward to me in terms of how you drag and drop these nodes and connect them with arrows."
"Important features include that it comprises lots of functionality to connect data from various sources through connector availability, scheduling pipelines at any time, and integration with third-party and security solutions for encryption."
"What I love the most is that StreamSets is very light. It's a containerized application. It's easy to use with Docker. If you are a large organization, it's very easy to use Kubernetes."
"The most valuable feature is the pipelines because they enable us to pull in and push out data from different sources and to manipulate and clean things up within them."
"The scheduling within the data engineering pipeline is very much appreciated, and it has a wide range of connectors for connecting to any data sources like SQL Server, AWS, Azure, etc. We have used it with Kafka, Hadoop, and Azure Data Factory Datasets. Connecting to these systems with StreamSets is very easy."
"StreamSets’ data drift resilience has reduced the time it takes us to fix data drift breakages. For example, in our previous Hadoop scenario, when we were creating the Sqoop-based processes to move data from source to destinations, we were getting the job done. That took approximately an hour to an hour and a half when we did it with Hadoop. However, with the StreamSets, since it works on a data collector-based mechanism, it completes the same process in 15 minutes of time. Therefore, it has saved us around 45 minutes per data pipeline or table that we migrate. Thus, it reduced the data transfer, including the drift part, by 45 minutes."
"Also, the intuitive canvas for designing all the streams in the pipeline, along with the simplicity of the entire product are very big pluses for me. The software is very simple and straightforward. That is something that is needed right now."
"It is a very powerful, modern data analytics solution, in which you can integrate a large volume of data from different sources. It integrates all of the data and you can design, create, and monitor pipelines according to your requirements. It is an all-in-one day data ops solution."
 

Cons

"I'd like to see more integration with other reporting sources."
"The tool's overall functionalities need to improve since, nowadays, many tools, from a business perspective, are easy to use."
"A key area for improvement in Collibra Catalog lies in its integration capabilities, particularly with a broader range of sources."
"Collibra Catalog could improve its automation to increase the efficiency of the software."
"StreamSets should provide a mechanism to be able to perform data quality assessment when the data is being moved from one source to the target."
"The documentation is inadequate and has room for improvement because the technical support does not regularly update their documentation or the knowledge base."
"Currently, we can only use the query to read data from SAP HANA. What we would like to see, as soon as possible, is the ability to read from multiple tables from SAP HANA. That would be a really good thing that we could use immediately. For example, if you have 100 tables in SQL Server or Oracle, then you could just point it to the schema or the 100 tables and ingestion information. However, you can't do that in SAP HANA since StreamSets currently is lacking in this. They do not have a multi-table feature for SAP HANA. Therefore, a multi-table origin for SAP HANA would be helpful."
"In terms of the product, I don't think there is any room for improvement because it is very good. One small area of improvement that is very much needed is on the knowledge base side. Sometimes, it is not very clear how to set up a certain process or a certain node for a person who's using the platform for the first time."
"Sometimes, it is not clear at first how to set up nodes. A site with an explanation of how each node works would be very helpful."
"The data collector in StreamSets has to be designed properly. For example, a simple database configuration with MySQL DB requires the MySQL Connector to be installed."
"The software is very good overall. Areas for improvement are the error logging and the version history. I would like to see better, more detailed error logging information."
"StreamSet works great for batch processing but we are looking for something that is more real-time. We need latency in numbers below milliseconds."
 

Pricing and Cost Advice

"The product is highly priced compared to other vendors."
"Collibra Catalog is fairly priced - I would rate their pricing seven out of ten."
"I think they can bring a few more features and align better with other quality products."
"Collibra offers a per-user licensing model."
"It's not so favorable for small companies."
"We use the free version. It's great for a public, free release. Our stance is that the paid support model is too expensive to get into. They should honestly reevaluate that."
"StreamSets Data Collector is open source. One can utilize the StreamSets Data Collector, but the Control Hub is the main repository where all the jobs are present. Everything happens in Control Hub."
"I believe the pricing is not equitable."
"The pricing is good, but not the best. They have some customized plans you can opt for."
"Its pricing is pretty much up to the mark. For smaller enterprises, it could be a big price to pay at the initial stage of operations, but the moment you have the Seed B or Seed C funding and you want to scale up your operations and aren't much worried about the funds, at that point in time, you would need a solution that could be scaled."
"StreamSets is an expensive solution."
"We are running the community version right now, which can be used free of charge."
report
Use our free recommendation engine to learn which Metadata Management solutions are best for your needs.
814,649 professionals have used our research since 2012.
 

Top Industries

By visitors reading reviews
Financial Services Firm
29%
Computer Software Company
15%
Energy/Utilities Company
6%
Manufacturing Company
6%
Financial Services Firm
17%
Computer Software Company
13%
Manufacturing Company
9%
Government
6%
 

Company Size

By reviewers
Large Enterprise
Midsize Enterprise
Small Business
No data available
 

Questions from the Community

What do you like most about Collibra Catalog?
The data lineage capability is valuable as it shows how different sources are connected and how data flows, which is crucial for projects like migrations. Moreover, data lineage visualization in C...
What needs improvement with Collibra Catalog?
I'd like to see more integration with other reporting sources like Qlik Sense, beyond the currently supported ones like Tableau and Power BI.
What do you like most about StreamSets?
The best thing about StreamSets is its plugins, which are very useful and work well with almost every data source. It's also easy to use, especially if you're comfortable with SQL. You can customiz...
What needs improvement with StreamSets?
We often faced problems, especially with SAP ERP. We struggled because many columns weren't integers or primary keys, which StreamSets couldn't handle. We had to restructure our data tables, which ...
What is your primary use case for StreamSets?
StreamSets is used for data transformation rather than ETL processes. It focuses on transforming data directly from sources without handling the extraction part of the process. The transformed data...
 

Learn More

Video not available
 

Overview

 

Sample Customers

AXA XL, DNB, Adobe, PMI, Holland America Line, UC Davis Health, Cox Automotive
Availity, BT Group, Humana, Deluxe, GSK, RingCentral, IBM, Shell, SamTrans, State of Ohio, TalentFulfilled, TechBridge
Find out what your peers are saying about Informatica, Alation, SAP and others in Metadata Management. Updated: October 2024.
814,649 professionals have used our research since 2012.