Pentaho Data Integration and Analytics Reviews and Pricing

reviewer1855218

Data Architect at a consumer goods company with 1,001-5,000 employees

May 12, 2022

Download

I can extend and customize existing pipeline templates for changing requirements, saving time

Pros and Cons

"I can use Python, which is open-source, and I can run other scripts, including Linux scripts. It's user-friendly for running any object-based language. That's a very important feature because we live in a world of open-source."

"I would like to see improvement when it comes to integrating structured data with text data or anything that is unstructured. Sometimes we get all kinds of different files that we need to integrate into the warehouse."

What is our primary use case?

We use it for orchestration and as an ETL tool to move data from one environment to another, including moving data from on-premises to the cloud and moving operational data from different source systems into the data warehouse.

How has it helped my organization?

People are now able to get access to the data when they need it. That is what is most important. All the reports go out on time.

The solution enables us to use one tool that gives a single, end-to-end data management experience from ingestion to insights. From the reporting point of view, we are able to make our customers happy. Are they able to get their reports in time? Are they able to get access to the data that they need on time? Yes. They're happy, we're happy, that's it.

With the automation of everything, if I start breaking it into numbers, we don't have to hire three or four people to do one simple task. We've been able to develop some generic IT processes so that we don't have to reinvent the wheel. I just have to extend the existing pipeline and customize it to whatever requirements I have at that point in time. Otherwise, whenever we would get a project, we would actually have to reinvent the wheel from scratch. Now, the generic pipeline templates that we can reuse save us so much time and money.

It has also reduced our ETL development time by 40 percent, and that translates into cost savings.

Before we used Pentaho, we used to do some of this stuff manually, and some of the ETL jobs would run for hours, but most of the ETL jobs, like the monthly reports, now run within 45 minutes, which is pretty awesome. Everything that we used to do manually is now orchestrated.

And now, with everything in the cloud, any concerns about hardware are taken care of for us. That helps with maintenance costs.

What is most valuable?

I can use Python, which is open-source, and I can run other scripts, including Linux scripts. It's user-friendly for running any object-based language. That's a very important feature because we live in a world of open-source. With open-source on the table, I am in a position to transform the data where it's actually being moved from one environment to another.

Whether we are working with structured or unstructured data, the tool has been helpful. We are actually able to extend it to read JSON data by creating some Java components.

The solution gives us the flexibility to deploy it in any environment, including on-premises or in the cloud. That is another very important feature.

What needs improvement?

I would like to see improvement when it comes to integrating structured data with text data or anything that is unstructured. Sometimes we get all kinds of different files that we need to integrate into the warehouse.

By using some of the Python scripts that we have, we are able to extract all this text data into JSON. Then, from JSON, we are able to create external tables in the cloud whereby, at any one time, somebody has access to this data on the S3 drive.

Buyer's Guide

Pentaho Data Integration and Analytics

October 2025

Free Report: Pentaho Data Integration and Analytics Reviews and More

Learn what your peers think about Pentaho Data Integration and Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: October 2025.

DOWNLOAD NOW

872,706 professionals have used our research since 2012.

For how long have I used the solution?

I've been using Hitachi Lumada Data Integration since 2014.

What do I think about the stability of the solution?

It's been stable.

What do I think about the scalability of the solution?

We are able to scale our environment. For example, if I had that many workloads, I could scale the tool to run on three instances, and all the workloads would be distributed equally.

How are customer service and support?

Their tech support is awesome. They always answer and attend to any incidents that we raise.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

Everything was done manually in Excel. The main reason we went with Pentaho is that it's open-source.

How was the initial setup?

The deployment was like any other deployment. All the steps are written down in a document and you just have to follow those steps. It was simple for us.

What other advice do I have?

The performance of Pentaho, like any other ETL tool, starts from the database side, once you write good, optimized scripts. The optimization of Pentaho depends on the hardware it's sitting on. Once you have enough RAM on your VM, you are in a position to run any workloads.

Overall it is an awesome tool. We are satisfied with our decision to go with Hitachi's product. It's like any other ETL tool. It's like SQL Server Integration Services, Informatica, or DataStage. On a scale of one to 10, where 10 is best, I would give it a nine in terms of recommending it to a colleague.

Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.

Renan Guedert

Business Intelligence Specialist at a recruiting/HR firm with 11-50 employees

Apr 20, 2022

Download

Creates a good, visual pipeline that is easy to understand, but doesn't handle big data well

Pros and Cons

"Sometimes, it took a whole team about two weeks to get all the data to prepare and present it. After the optimization of the data, it took about one to two hours to do the whole process. Therefore, it has helped a lot when you talk about money, because it doesn't take a whole team to do it, just one person to do one project at a time and run it when you want to run it. So, it has helped a lot on that side."

"A big problem after deploying something that we do in Lumada is with Git. You get a binary file to do a code review. So, if you need to do a review, you have to take pictures of the screen to show each step. That is the biggest bug if you are using Git."