Try our new research platform with insights from 80,000+ expert users

Collibra Catalog vs Pentaho Data Integration and Analytics comparison

 

Comparison Buyer's Guide

Executive Summary
 

Categories and Ranking

Collibra Catalog
Average Rating
7.8
Number of Reviews
5
Ranking in other categories
Metadata Management (4th)
Pentaho Data Integration an...
Average Rating
8.0
Number of Reviews
51
Ranking in other categories
Data Integration (30th)
 

Mindshare comparison

Collibra Catalog and Pentaho Data Integration and Analytics aren’t in the same category and serve different purposes. Collibra Catalog is designed for Metadata Management and holds a mindshare of 11.2%, up 8.6% compared to last year.
Pentaho Data Integration and Analytics, on the other hand, focuses on Data Integration, holds 1.4% mindshare, up 0.5% since last year.
Metadata Management
Data Integration
 

Featured Reviews

Aditya Pawar - PeerSpot reviewer
Mar 1, 2024
Effective for data discovery and supports multiple authentication methods, not just username and password
For data discovery, we create datasets. The most frequently used datasets are featured on the dashboard. Users can create their own if their requirements aren't met by the most frequently used datasets. We also create data requests, and owners can approve these requests, which adds a process layer to accessing particular datasets. So, it has been effective for data discovery in our company. I use different functionalities within the tool. I use Collibra Catalog for metadata management or Collibra Lineage for data governance. Once configured, data in the Catalog will be automatically updated, reducing the need for manual maintenance. So, the automation feature has positively impacted our data management tasks.
Jacopo Zaccariotto - PeerSpot reviewer
Apr 5, 2022
The drag-and-drop interface makes it easier to use than some competing products
It's difficult to use custom code. Implementing a pipeline with pre-built blocks is straightforward, but it's harder to insert custom code inside the pre-built blocks. The web interface is rusty, and the biggest problem with Pentaho is debugging and troubleshooting. It isn't easy to build the pipeline incrementally. At least in our case, it's hard to find a way to execute step by step in the debugging mode. Repository management is also a shortcoming, but I'm not sure if that's just a limitation of the free version. I'm not sure if Pentaho can use an external repository. It's a flat-file repository inside a virtual machine. Back in the day, we would want to deploy this repository on a database. Pentaho's data management covers ingestion and insights but I'm not sure if it's end-to-end management—at least not in the free version we are using—because some of the intermediate steps are missing, like data cataloging and data governance features. This is the weak spot of our Pentaho version.

Quotes from Members

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Pros

"Collibra Catalog's best feature is the data quality checker."
"We have had no complaints about the stability."
"The data lineage capability is valuable as it shows how different sources are connected and how data flows, which is crucial for projects like migrations. Moreover, data lineage visualization in Collibra Catalog aids our data governance initiatives."
"Collibra Catalog has significantly enhanced data governance and compliance for our team, primarily through its valuable feature of endpoint lineage enabling visual representation of the data."
"Collibra Catalog is simple to use and user-friendly for those who are not technically inclined since it is easy to find while also easy to see data lineage diagrams."
"The solution has a free to use community version."
"The fact that it enables us to leverage metadata to automate data pipeline templates and reuse them is definitely one of the features that we like the best. The metadata injection is helpful because it reduces the need to create and maintain additional ETLs. If we didn't have that feature, we would have lots of duplicated ETLs that we would have to create and maintain. The data pipeline templates have definitely been helpful when looking at productivity and costs."
"The fact that it's a low-code solution is valuable. It's good for more junior people who may not be as experienced with programming."
"It has improved our data integration capabilities​."
"Flexible deployment, in any environment, is very important to us. That is the key reason why we ended up with these tools. Because we have a very highly secure environment, we must be able to install it in multiple environments on multiple different servers. The fact that we could use the same tool in all our environments, on-prem and in the cloud, was very important to us."
"The area where Lumada has helped us is in the commercial area. There are many extractions to compose reports about our sales team performance and production steps. Since we are using Lumada to gather data from each industry in each country. We can get data from Argentina, Chile, Brazil, and Colombia at the same time. We can then concentrate and consolidate it in only one place, like our data warehouse. This improves our production performance and need for information about the industry, production data, and commercial data."
"I can use Python, which is open-source, and I can run other scripts, including Linux scripts. It's user-friendly for running any object-based language. That's a very important feature because we live in a world of open-source."
"Sometimes, it took a whole team about two weeks to get all the data to prepare and present it. After the optimization of the data, it took about one to two hours to do the whole process. Therefore, it has helped a lot when you talk about money, because it doesn't take a whole team to do it, just one person to do one project at a time and run it when you want to run it. So, it has helped a lot on that side."
 

Cons

"A key area for improvement in Collibra Catalog lies in its integration capabilities, particularly with a broader range of sources."
"The tool's overall functionalities need to improve since, nowadays, many tools, from a business perspective, are easy to use."
"Collibra Catalog could improve its automation to increase the efficiency of the software."
"I'd like to see more integration with other reporting sources."
"It's not very stable, at least not in the case of the community edition. I'm working with the community edition right now and I think perhaps it is because of that it is not very stable, it causes the system to sometimes hang. I'm not sure if this is the case for pair tiers."
"I would like to see support for some additional cloud sources. It doesn't support Azure, for example. I was trying to do a PoC with Azure the other day but it seems they don't support it."
"I work with different databases. I would like to work with more connectors to new databases, e.g., DynamoDB and MariaDB, and new cloud solutions, e.g., AWS, Azure, and GCP. If they had these connectors, that would be great. They could improve by building new connectors. If you have native connections to different databases, then you can make instructions more efficient and in a more natural way. You don't have to write any scripts to use that connector."
"In the Community edition, it would be nice to have more modules that allow you to code directly within the application. It could have R or Python completely integrated into it, but this could also be because I'm using an older version."
"Its basic functionality doesn't need a whole lot of change. There could be some improvement in the consistency of the behavior of different transformation steps. The software did start as open-source and a lot of the fundamental, everyday transformation steps that you use when building ETL jobs were developed by different people. It is not a seamless paradigm. A table input step has a different way of thinking than a data merge step."
"The performance could be improved. If they could have analytics perform well on large volumes, that would be a big deal for our products."
"The testing and quality could really improve. Every time that there is a major release, we are very nervous about what is going to get broken. We have had a lot of experience with that, as even the latest one was broken. Some basic things get broken. That doesn't look good for Hitachi at all. If there is one place I would advise them to spend some money and do some effort, it is with the quality. It is not that hard to start putting in some unit tests so basic things don't get broken when they do a new release. That just looks horrible, especially for an organization like Hitachi."
"I would like to see improvement when it comes to integrating structured data with text data or anything that is unstructured. Sometimes we get all kinds of different files that we need to integrate into the warehouse."
 

Pricing and Cost Advice

"The product is highly priced compared to other vendors."
"I think they can bring a few more features and align better with other quality products."
"Collibra offers a per-user licensing model."
"Collibra Catalog is fairly priced - I would rate their pricing seven out of ten."
"It does seem a bit expensive compared to the serverless product offering. Tools, such as Server Integration Services, are "almost" free with a database engine. It is comparable to products like Alteryx, which is also very expensive."
"The price of the regular version is not reasonable and it should be lower."
"I use it because it is free. I download from their page for free. I don't have to pay for a license. With other tools, I have to pay for the licenses. That is why I use Pentaho."
"I believe the pricing of the solution is more affordable than the competitors"
"You don't need the Enterprise Edition, you can go with the Community Edition. That way you can use it for free and, for free, it's a pretty good tool to use."
"I mostly used the open-source version. I didn't work with a license."
"The cost of these types of solutions are expensive. So, we really appreciate what we get for our money. Though, we don't think of the solution as a top-of-the-line solution or anything like that."
"We are using the Community Edition. We have been trying to use and sell the Enterprise version, but that hasn't been possible due to the budget required for it."
report
Use our free recommendation engine to learn which Metadata Management solutions are best for your needs.
814,649 professionals have used our research since 2012.
 

Top Industries

By visitors reading reviews
Financial Services Firm
29%
Computer Software Company
15%
Energy/Utilities Company
6%
Manufacturing Company
6%
Financial Services Firm
23%
Computer Software Company
14%
Government
7%
Retailer
5%
 

Company Size

By reviewers
Large Enterprise
Midsize Enterprise
Small Business
No data available
 

Questions from the Community

What do you like most about Collibra Catalog?
The data lineage capability is valuable as it shows how different sources are connected and how data flows, which is crucial for projects like migrations. Moreover, data lineage visualization in C...
What needs improvement with Collibra Catalog?
I'd like to see more integration with other reporting sources like Qlik Sense, beyond the currently supported ones like Tableau and Power BI.
Which ETL tool would you recommend to populate data from OLTP to OLAP?
Hi Rajneesh, yes here is the feature comparison between the community and enterprise edition : https://www.hitachivantara.com/en-us/pdf/brochure/leverage-open-source-benefits-with-assurance-of-hita...
What do you think can be improved with Hitachi Lumada Data Integrations?
In my opinion, the reporting side of this tool needs serious improvements. In my previous company, we worked with Hitachi Lumada Data Integration and while it does a good job for what it’s worth, ...
What do you use Hitachi Lumada Data Integrations for most frequently?
My company has used this product to transform data from databases, CSV files, and flat files. It really does a good job. We were most satisfied with the results in terms of how many people could us...
 

Also Known As

No data available
Hitachi Lumada Data Integration, Kettle, Pentaho Data Integration
 

Overview

 

Sample Customers

AXA XL, DNB, Adobe, PMI, Holland America Line, UC Davis Health, Cox Automotive
66Controls, Providential Revenue Agency of Ro Negro, NOAA Information Systems, Swiss Real Estate Institute
Find out what your peers are saying about Informatica, Alation, SAP and others in Metadata Management. Updated: October 2024.
814,649 professionals have used our research since 2012.