Try our new research platform with insights from 80,000+ expert users
RicardoDíaz - PeerSpot reviewer
COO / CTO at a tech services company with 11-50 employees
Real User
We can create pipelines with minimal manual or custom coding, and we can quickly implement what we need with its drag-and-drop interface
Pros and Cons
  • "Its drag-and-drop interface lets me and my team implement all the solutions that we need in our company very quickly. It's a very good tool for that."
  • "In terms of the flexibility to deploy in any environment, such as on-premise or in the cloud, we can do the cloud deployment only through virtual machines. We might also be able to work on different environments through Docker or Kubernetes, but we don't have an Azure app or an AWS app for easy deployment to the cloud. We can only do it through virtual machines, which is a problem, but we can manage it. We also work with Databricks because it works with Spark. We can work with clustered servers, and we can easily do the deployment in the cloud. With a right-click, we can deploy Databricks through the app on AWS or Azure cloud."

What is our primary use case?

We are a service delivery enterprise, and we have different use cases. We deliver solutions to other enterprises, such as banks. One of the use cases is for real-time analytics of the data we work with. We take CDC data from Oracle Database, and in real-time, we generate a product offer for all the products of a client. All this is in real-time. The client could be at the ATM or maybe at an agency, and they can access the product offer. 

We also use Pentaho within our organization to integrate all the documents and Excel spreadsheets from our consultants and have a dashboard for different hours for different projects.

In terms of version, currently, Pentaho Data Integration is on version 9, but we are using version 8.2. We have all the versions, but we work with the most stable one. 

In terms of deployment, we have two different types of deployments. We have on-prem and private cloud deployments.

How has it helped my organization?

I work with a lot of data. We have about 50 terabytes of information, and working with Pentaho Data Integration along with other databases is very fast.

Previously, I had three people to collect all the data and integrate all Excel spreadsheets. To give me a dashboard with the information that I need, it took them a day or two. Now, I can do this work in about 15 minutes.

It enables us to create pipelines with minimal manual coding or custom coding efforts, which is one of its best features. Pentaho is one of the few tools with which you can do anything you can imagine. Our business is changing all the time, and it is best for our business if I can use less time to develop new pipelines.

It provides the ability to develop and deploy data pipeline templates once and reuse them. I use them at least once a day. It makes my daily life easier when it comes to data pipelines.

Previously, I have used other tools such as Integration Services from Microsoft, Data Services for SAP, and Informatica. Pentaho reduces the ETL implementation time by 5% to 50%.

What is most valuable?

Pentaho from Hitachi is a suite of different tools. Pentaho Data Integration is a part of the suite, and I love the drag-and-drop functionality. It is the best. 

Its drag-and-drop interface lets me and my team implement all the solutions that we need in our company very quickly. It's a very good tool for that.

What needs improvement?

Their client support is very bad. It should be improved. There is also not much information on Hitachi forums or Hitachi web pages. It is very complicated.

In terms of the flexibility to deploy in any environment, such as on-premise or in the cloud, we can do the cloud deployment only through virtual machines. We might also be able to work on different environments through Docker or Kubernetes, but we don't have an Azure app or an AWS app for easy deployment to the cloud. We can only do it through virtual machines, which is a problem, but we can manage it. We also work with Databricks because it works with Spark. We can work with clustered servers, and we can easily do the deployment in the cloud. With a right-click, we can deploy Databricks through the app on AWS or Azure cloud.

Buyer's Guide
Pentaho Data Integration and Analytics
November 2024
Learn what your peers think about Pentaho Data Integration and Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
816,406 professionals have used our research since 2012.

For how long have I used the solution?

I have been using Pentaho Data Integration for 12 years. The first version that I tested and used was 3.2 in 2010.

How are customer service and support?

Their technical support is not good. I would rate them 2 out of 10 because they don't have good technical skills to solve problems.

How would you rate customer service and support?

Negative

How was the initial setup?

It is very quick and simple. It takes about five minutes.

What other advice do I have?

I have a good knowledge of this solution, and I would highly recommend it to a friend or colleague. 

It provides a single, end-to-end data management experience from ingestion to insights, but we have to create different pipelines to generate the metadata management. It's a little bit laborious to work with Pentaho, but we can do that.

I've heard a lot of people say it's complicated to use, but Pentaho is one of the few tools where you can do anything you can imagine. It is very good and quite simple, but you need to have the right knowledge and the right people to handle the tool. The skills needed to create a business intelligence solution or a data integration solution with Pentaho are problem-solving logic and maybe database knowledge. You can develop new steps, and you can develop new functionality in Pentaho Lumada, but you must have the knowledge of advanced Java programming. Our experience, in general, is very good. 

Overall, I am satisfied with our decision to purchase Hitachi's product services and solutions. My satisfaction level is at an eight out of ten.

I am not much aware of the roadmap of Hitachi Vantara. I don't read much about that.

I would rate this solution an eight out of ten. 

Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
PeerSpot user
Solution Integration Consultant II at a tech vendor with 201-500 employees
Consultant
Reduces the effort required to build sophisticated ETLs
Pros and Cons
  • "We use Lumada’s ability to develop and deploy data pipeline templates once and reuse them. This is very important. When the entire pipeline is automated, we do not have any issues in respect to deployment of code or with code working in one environment but not working in another environment. We have saved a lot of time and effort from that perspective because it is easy to build ETL pipelines."
  • "It could be better integrated with programming languages, like Python and R. Right now, if I want to run a Python code on one of my ETLs, it is a bit difficult to do. It would be great if we have some modules where we could code directly in a Python language. We don't really have a way to run Python code natively."

What is our primary use case?

My work primarily revolves around data migration and data integration for different products. I have used them in different companies, but for most of our use cases, we use it to integrate all the data that needs to flow into our product. Also, we can have outbound from our product when we need to send to different, various integration points. We use this product extensively to build ETLs for those use cases.

We are developing ETLs for the inbound data into the product as well as outbound to various integration points. Also, we have a number of core ETLs written on this platform to enhance our product.

We have two different modes that we offer: one is on-premises and the other is on the cloud. On the cloud, we have an EC2 instance on AWS, then we have installed that EC2 instance and we call it using the ETL server. We also have another server for the application where the product is installed.

We use version 8.3 in the production environment, but in the dev environment, we use version 9 and onwards.

How has it helped my organization?

We have been able to reduce the effort required to build sophisticated ETLs. Also, we now are in the migration phase from an on-prem product to a cloud-native application. 

We use Lumada’s ability to develop and deploy data pipeline templates once and reuse them. This is very important. When the entire pipeline is automated, we do not have any issues in respect to deployment of code or with code working in one environment but not working in another environment. We have saved a lot of time and effort from that perspective because it is easy to build ETL pipelines.

What is most valuable?

The metadata injection feature is the most valuable because we have used it extensively to build frameworks, where we have used it to dynamically generate code based on different configurations. If you want to make a change at all, you do not need to touch the actual code. You just need to make some configuration changes and the framework will dynamically generate code for that as per your configuration. 

We have a UI where we can create our ETL pipelines as needed, which is a key advantage for us. This is very important because it reduces the time to develop for a given project. When you need to build the whole thing using code, you need to do multiple rounds of testing. Therefore, it helps us to save some effort on the QA side.

Hitachi Vantara's roadmap has a pretty good list of features that they have been releasing with every new version. For instance, in version 9, they have included metadata injection for some of the steps. The most important elements of this roadmap to our organization’s strategy are the data-driven approach that this product is taking and the fact that we have a very low-code platform. Combining these two is what gives us the flexibility to utilize this software to enhance our product.

What needs improvement?

It could be better integrated with programming languages, like Python and R. Right now, if I want to run a Python code on one of my ETLs, it is a bit difficult to do. It would be great if we have some modules where we could code directly in a Python language. We don't really have a way to run Python code natively. 

For how long have I used the solution?

I have been working with this tool for five to six years.

What do I think about the stability of the solution?

They are making it a lot more stable. Earlier, stability used to be an issue when it was not with Hitachi. Now, we don't see those kinds of issues or bugs within the platform because it has become far more stable. Also, we see a lot of new big data features, such as connecting to the cloud.

What do I think about the scalability of the solution?

Lumada is flexible to deploy in any environment, whether on-premises or the cloud, which is very important. When we are processing data in batches on certain days, e.g., at the end of the week or month, we might have more data and need more processing power or RAM. However, most times, there might be very minimal usage of that CPU power. In that way, the solution has helped us to dynamically scale up, then scale down when we see that we have more data that we need to process.

The scalability is another key advantage of this product versus some of the others in the market since we can tweak and modify a number of parameters. We are really impressed with the scalability.

We have close to 80 people who are using this product actively. Their roles go all the way from junior developers to support engineers. We also have people who have very little coding knowledge and are more into the management side of things utilizing this tool.

How are customer service and support?

I haven't been part of any technical support discussions with Hitachi.

Which solution did I use previously and why did I switch?

We are very satisfied with our decision to purchase Hitachi's product. Previously, we were using another ETL service that had a number of limitations. It was not a modern ETL service at all. For anything, we had to rely on another third-party software. Then, with Hitachi Lumada, we don't have to do that. In that way, we are really satisfied with the orchestration or cloud-native steps that they offer. We are really happy on those fronts.

We were using something called Actian Services, which had less features and it ended up costing more than the enterprise edition of Pentaho.

We could not do a number of things on Actian. For instance, we were unable to call other APIs or connect to an S3 bucket. It was not a very modern solution. Whereas, with Pentaho, we could do all these things as well as have great marketplaces where we could find various modules and third-party plugins. Those features were simply not there in the other tool.

How was the initial setup?

The initial setup was pretty straightforward. 

What about the implementation team?

We did not have any issues configuring it, even in my local machine. For the enterprise edition, we have a separate infrastructure team doing that. However, for at least the community edition, the deployment is pretty straightforward.

What was our ROI?

We have seen at least 30% savings in terms of effort. That has helped us to price our service and products more aggressively in the market, helping us to win more clients.

It has reduced our ETL development time. Per project, it has reduced by around 30% to 35%.

We can price more aggressively. We were actually able to win projects because we had great reusability of ETLs. A code that was used for one client can be reused with very minimal changes. We didn't have any upfront cost for kick-starting projects using the Community edition. It is only the Enterprise edition that has a cost. 

What's my experience with pricing, setup cost, and licensing?

For most development tasks, the Enterprise edition should be sufficient. It depends on the type of support that you require for your production environment.

Which other solutions did I evaluate?

We did evaluate SSIS since our database is based on Microsoft SQL server. SSIS comes with any purchase of an SQL Server license. However, even with SSIS, there were some limitations. For example, if you want to build a package and reuse it, SSIS doesn't provide the same kinds of abilities that Pentaho does. The amount of reusability reduces when we try to build the same thing using SSIS. Whereas, in Pentaho, we could literally reuse the same code by using some of its features.

SSIS comes with the SQL Server and is easier to maintain, given that there are far more people who would have knowledge of SSIS. However, if I want to do a PCP encryption or make an API connection, it is difficult. To create a reusable package is not that easy, which would be the con for SSIS. 

What other advice do I have?

The query performance depends on the database. It is more likely to be good if you have a good database server with all the indexes and bells and whistles of a database. However, from a data integration tool perspective, I am not seeing any issues with respect to query performance.

We do not build visualization features that much with Hitachi. For the reporting purposes, we have been using one of the tools from the product, then prepare the data accordingly. 

We use this for all the projects that we are currently running. Going forward, we will be sticking only to using this ETL tool.

We haven't had any roadblocks using Lumada Data Integration.

On a scale of one to 10, I would recommend Hitachi Vantara to a friend or colleague as a nine.

If you need to build ETLs quickly in a low-code environment, where you don't want to spend a lot of time on the development side of things but it is a little difficult to find resources, then train them in this product. It is always worth that effort because it ends up saving a lot of time and resources on the development side of projects.

Overall, I would rate the product as a nine out of 10.

Which deployment model are you using for this solution?

Hybrid Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Pentaho Data Integration and Analytics
November 2024
Learn what your peers think about Pentaho Data Integration and Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: November 2024.
816,406 professionals have used our research since 2012.
Assistant General Manager at DTDC Express Limited
Real User
Scales well with data and processes, but the cost should be lower and real-time processing capabilities improved
Pros and Cons
  • "The amount of data that it loads and processes is good."
  • "I would like to see improvements made for real-time data processing."

What is our primary use case?

We are using just the simple features of this product.

We're using it as a data warehouse and then for building dimensions.

What needs improvement?

The shortcoming in version 7 is that we are unable to connect to Google Cloud Storage (GCS), where I can write the results from Pentaho. I'm able to connect to S3 using Pentaho 8, but when using it for GCS, I'm unable to connect. With people moving from on-premises deployments to the cloud, be it S3, Azure, or Google, we need a plugin where we can interact with these cloud vendors.

I would like to see improvements made for real-time data processing. It is something that I will be looking out for.

For how long have I used the solution?

We have been using Pentaho Data Integration for three years.

What do I think about the stability of the solution?

For all of the features that we have been using, it is a stable product.

What do I think about the scalability of the solution?

In terms of data loading and processes, the scalability is good.

We have a team of four people who are using it for analytics.

How are customer service and technical support?

As we are using the Community Version, we have not been in contact with technical support. Instead, we rely on forums and websites when we need to resolve a problem.

Which solution did I use previously and why did I switch?

In the past, I have worked with Talend, as well as SAP BO Data Services (BODS). However, that was with another company. This organization started with Pentaho and we are still using it.

How was the initial setup?

It is a straightforward setup process. It took between three and four hours to complete.

What's my experience with pricing, setup cost, and licensing?

We are using the Community Version, which is available free of charge.

The price of the regular version is not reasonable and it should be lower.

What other advice do I have?

My advice for anybody who is researching this product is that if they want to do batch processing, then this is a good choice. The amount of data that it loads and processes is good.

Based on the features that I have used and my experience, I would rate this solution a seven out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Nirmal Kosuru - PeerSpot reviewer
Nirmal KosuruData Architect at Cognizio
Top 20Real User

Yes the integration tool should be made available as Professional or Community / Standard / Enterprise Editions and Pricing should be made accordingly on the industry by industry  basis or cases by case. And also there should be Transparency in the pricing and availability of community edition as the case was earlier when Pentaho management realeased it into market.

Manager, Systems Development at a manufacturing company with 5,001-10,000 employees
Real User
An affordable solution that makes it simple to do some fairly complicated things, but it could be improved in terms of consistency of different transformation steps
Pros and Cons
  • "It makes it pretty simple to do some fairly complicated things. Both I and some of our other BI developers have made stabs at using, for example, SQL Server Integration Services, and we found them a little bit frustrating compared to Data Integration. So, its ease of use is right up there."
  • "Its basic functionality doesn't need a whole lot of change. There could be some improvement in the consistency of the behavior of different transformation steps. The software did start as open-source and a lot of the fundamental, everyday transformation steps that you use when building ETL jobs were developed by different people. It is not a seamless paradigm. A table input step has a different way of thinking than a data merge step."

What is our primary use case?

Our primary use case is to populate a data warehouse and data marts, but we also use it for all kinds of data integration scenarios and file movement. It is almost like middleware between different enterprise solutions. We take files from our legacy app system, do some work on them, and then call SAP BAPIs, for example.

It is deployed on-premises. It gives you the flexibility to deploy it in any environment, whether on-premises or in the cloud, but this flexibility is not that important to us. We could deploy it on the cloud by spinning up a new server in AWS or Azure, but as a manufacturing facility, it is not important to us. Our customer preference is primarily to deploy things on-premises.

We usually stay one version behind the latest one. We're a manufacturing facility. So, we're very sensitive to any bugs or issues. We don't do automatic upgrades. They're a fairly manual process.

How has it helped my organization?

We've had it for a long time. So, we've realized a lot of the improvements that anybody would realize from almost any data integration product.

The speed of developing solutions has been the best improvement. It has reduced the development time and improved the speed of getting solutions deployed. The reduced ETL development time varies by the size and complexity of the project. We probably spend days or weeks less than then if we were using a different tool.

It is tremendously flexible in terms of adding custom code by using a variety of different languages if you want to, but we had relatively few scenarios where we needed it. We do very little custom coding. Because of the tool we're using, it is not critical. We have developed thousands of transformations and jobs in the tool.

What is most valuable?

It makes it pretty simple to do some fairly complicated things. Both I and some of our other BI developers have made stabs at using, for example, SQL Server Integration Services, and we found them a little bit frustrating compared to Data Integration. So, its ease of use is right up there.

Its performance is a pretty close second. It is a pretty highly performant system. Its query performance on large data sets is very good.

What needs improvement?

Its basic functionality doesn't need a whole lot of change. There could be some improvement in the consistency of the behavior of different transformation steps. The software did start as open-source and a lot of the fundamental, everyday transformation steps that you use when building ETL jobs were developed by different people. It is not a seamless paradigm. A table input step has a different way of thinking than a data merge step.

For how long have I used the solution?

We have been using this solution for more than 10 years.

What do I think about the stability of the solution?

Its stability is very good.

What do I think about the scalability of the solution?

Its scalability is very good. We've been running it for a long time, and we've got dozens, if not hundreds, of jobs running a day.

We probably have 200 or 300 people using it across all areas of the business. We have people in production control, finance, and what we call materials management. We have people in manufacturing, procurement, and of course, IT. It is very widely and extensively used. We're increasing its usage all the time.

How are customer service and support?

They are very good at quickly and effectively solving the issues we have brought up. Their support is well structured. They're very responsive.

Because we're very experienced in it, when we come to them with a problem, it is usually something very obscure and not necessarily easy to solve. We've had cases where when we were troubleshooting issues, they applied just a remarkable amount of time and effort to troubleshoot them.

Support seems to have very good access to development and product management as a tier-two. So, it is pretty good. I would give their technical support an eight out of ten.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We didn't have another data integration product before Pentaho.

How was the initial setup?

I installed it. It was straightforward. It took about a day and a half to get the production environment up and running. That was probably because I was e-learning as I was going. With a services engagement, I bet you would have everything up in a day.

What about the implementation team?

We used Pentaho services for two days. Our experience was very good. We worked with Andy Grohe. I don't know if he is still there or not, but he was excellent.

What was our ROI?

We have absolutely seen an ROI, but I don't have the metrics. There are analytic cases that we just weren't able to do before. Due to the relatively low cost compared to some of the other solutions out there, it has been a no-brainer.

What's my experience with pricing, setup cost, and licensing?

We did a two or three-year deal the last time we did it. As compared to other solutions, at least so far in our experience, it has been very affordable. The licensing is by component. So, you need to make sure you only license the components that you really intend to use.

I am not sure if we have relicensed after the Hitachi acquisition, but previously, multi-year renewals resulted in a good discount. I'm not sure if this is still the case.

We've had the full suite for a lot of years, and there is just the initial cost. I am not aware of any additional costs.

What other advice do I have?

If you haven't used it before, it is worth engaging services with Pentaho for initial implementation. They'll just point out a number of small foibles related to perhaps case sensitivity. They'll just save you a lot of runs through the documentation to identify different configuration points that might be relevant to you.

I would highly recommend the Data Integration product, particularly for anyone with a Java background. Most of our BI developers at this point do not have a Java background, which isn't really that important. Particularly, if you're a Java business and you're looking for extensibility, the whole solution is built in Java, which just makes certain aspects of it a little more intuitive at first.

On the data integration side, it is really a good tool. A lot of investment dollars go into big data and new tech, and often, those are not very compelling for us. We're in an environment where we have medium data, not big data.

It provides a single end-to-end data management experience from ingestion to insights, but at this point, that's not critical to us. We mostly do the data integration work in Pentaho, and then we do the visualization in another tool. The single data management experience hasn't enabled us to discontinue the use of other data management analysis delivery tools just because we didn't really have them.

We take an existing job or transformation and use that as a test. It is certainly easy enough to copy one object to another. I am not aware of a specific templating capability, but we are not really missing anything there. It is very easy for us to clone a job or transformation just by doing a Save As, and we do that extensively.

Vantara's roadmap is a little fuzzy for me. There has been quite a bit of turnover in the customer-facing roles over the last five years. We understand that there is a roadmap to move to a pure web-based solution, but it hasn't been well communicated to us.

In terms of our decision to purchase Hitachi's product services or solutions, our satisfaction level is average or on balance.

I would rate this solution a seven out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
PeerSpot user
Renan Guedert - PeerSpot reviewer
Business Intelligence Specialist at a recruiting/HR firm with 11-50 employees
Real User
Creates a good, visual pipeline that is easy to understand, but doesn't handle big data well
Pros and Cons
  • "Sometimes, it took a whole team about two weeks to get all the data to prepare and present it. After the optimization of the data, it took about one to two hours to do the whole process. Therefore, it has helped a lot when you talk about money, because it doesn't take a whole team to do it, just one person to do one project at a time and run it when you want to run it. So, it has helped a lot on that side."
  • "A big problem after deploying something that we do in Lumada is with Git. You get a binary file to do a code review. So, if you need to do a review, you have to take pictures of the screen to show each step. That is the biggest bug if you are using Git."

What is our primary use case?

It was our principle to make the whole ETL and data warehousing on our projects. We created a whole step for collecting all the raw data from APIs and other databases from flat files, like Excel files, CSV files, and JSON files, to do the whole transformation and data preparation, then model the data and put it in SQL Server and integration services.

For business intelligence projects, it is sometimes pretty good, when you are extracting something from the API, to have a step to transform the JSON file from the API to an SQL table.

We use it heavily as a virtual machine running on Windows. We have also installed the open-source version on the desktop.

How has it helped my organization?

Lumada provides us with a single, end-to-end data management experience from ingestion to insights. This single data management experience is pretty good because then you don't have every analyst doing their own stuff. When you have one unique tool to do that, you can keep improving as well as have good practices and a solid process to do the projects.

What is most valuable?

It has many resourceful things. It has a variety of the things that you can do. It is also pretty open, since you can put in a Python script or JavaScript for everything. If you don't have the native tool on the application, you can build your own using scripts. You can build your other steps and jobs on the application. The liberty of the application has been pretty good.

Lumada enables us to create pipelines with minimal manual coding efforts, which is the most important thing. When creating a pipeline, you can see which steps are failing in the process. You can keep up the process and debug, if you have problems. So, it creates a good, visual pipeline that makes it easy to understand what you are doing during the entire process.

What needs improvement?

There is no straight-line explanation about bugs and errors that happen on the software. I must search heavily on the Internet, some YouTube videos, and other forums to know what is happening. The proper site of Hitachi and Lumada doesn't have the best explanation about bugs, errors, and the functions. I must search for other sources to understand what is happening. Usually, it is some guy in India or Russia who knows the answer.

A big problem after deploying something that we do in Lumada is with Git. You get a binary file to do a code review. So, if you need to do a review, you have to take pictures of the screen to show each step. That is the biggest bug if you are using Git.

After you create a data pipeline, if you could make a JSON file or something with another language, we could simplify the steps for creating what we are doing. Or, a simple flat file of text could be even better than that, but generated by their own platform so people can look and see what is happening. You shouldn't need to download the whole project in your own Pentaho, I would like to just look at the code and see if there is something wrong.

When I use it for open-source applications, it doesn't handle big data too well. Therefore, we have to use other kinds of technologies to manage that.

I would like it more accessible for Macs. Previously, I always used Linux, but some companies that I worked for before used MacBooks. It would be good if I could use Pentaho in that too since I need to use other tools or create a virtual machine to use Pentaho. So, it would be pretty good if the solution had a friendly version for Macs or Linux-based programs, like Ubuntu.

For how long have I used the solution?

I have been using it for six years, but more heavily over the last two years.

How are customer service and support?

I don't bring issues to Hitachi since Lumada is open source in some kind of way. 

Once, when I had a problem with connections because of the software, I saw the issue in the forums on the Internet because there was some type of bug happening.

Which solution did I use previously and why did I switch?

At my first company, we used just Lumada. At my second company, we used a lot of QlikView, SQL, Python, and Lumada. At my third company, we used Python and SQL much more. I used Lumada just once at that company. At my new company, I don't use it at all. I just use Azure Data Factory and SQL.

With Pentaho, we finally have data pipelines. We didn't have solid data pipelines before. After the data pipelines became very solid, the team who created them became very popular in the company.

How was the initial setup?

To set up the things, we used a virtual machine. It was a version where we can download it and unlock a machine too. You can do Ctrl-C and Ctrl-V with Pentaho because all you need to have is the newest version of Java. So, it was pretty smooth to do the setup. It took an hour maximum to deploy.

What was our ROI?

Sometimes, it took a whole team about two weeks to get all the data to prepare and present it. After the optimization of the data, it took about one to two hours to do the whole process. Therefore, it has helped a lot when you talk about money, because it doesn't take a whole team to do it, just one person to do one project at a time and run it when you want to run it. So, it has helped a lot on that side.

The solution reduced our ETL development time by a lot because a whole project used to take about a month to get done previously. After having Lumada, it took just a week. For a big company in Brazil, it saves a team at least $10,000 a month.

Which other solutions did I evaluate?

I just use the ETL tool. For data visualization, we are using Power BI. For data storage, we use SQL Server, Azure, or Google BigQuery.

We are just using the open-source application for ETL. We have never looked into other tools of Hitachi because they are paid.

I know other companies who are using Alteryx, which has a friendlier user interface, but they have fewer tools and are more difficult to utilize. My wife uses Alteryx, and I find it is not as good after I used Lumada because they have more solutions and it's open-source. Though, Alteryx has more security and better support.

What other advice do I have?

For someone who wants simple solutions, open-source tools are very perfect for someone who isn't a programmer or knowledgeable about technology. In one week, you can try to understand this solution and do your first project. In my opinion, it is the best tool for people starting out.

Lumada is a great tool. I would rate it as a straight seven out of 10. It gets the work done. The open-source version doesn't work well with big data sources, but there is a lot of flexibility and liberty to do everything you want and need. If the open-source version worked better with big data, then I would give it a straight eight since there is always room for improvement. Sometimes when debugging, some errors can be pretty difficult. It is a tool in principle, when you are starting business intelligence and data engineering, to understand everything that is going on.

Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
PeerSpot user
Michel Philippenko - PeerSpot reviewer
Project Manager at a computer software company with 51-200 employees
Real User
Forums are helpful, and creating ETL jobs is simpler than in other solutions
Pros and Cons
    • "I was not happy with the Pentaho Report Designer because of the way it was set up. There was a zone and, under it, another zone, and under that another one, and under that another one. There were a lot of levels and places inside the report, and it was a little bit complicated. You have to search all these different places using a mouse, clicking everywhere... each report is coded in a binary file... You cannot search with a text search tool..."

    What is our primary use case?

    I was working with Pentaho for a client. I had to implement complicated data flows and extraction. I had to take data from several sources in a PostgreSQL database by reading many tables in several databases, as well as from Excel files. I created some complex jobs. I also had to implement business reports with the Pentaho Report Designer.

    The client I was working for had Pentaho on virtual machines.

    What is most valuable?

    The ETL feature was the most valuable to me. I like it very much. It was very good.

    What needs improvement?

    I was not happy with the Pentaho Report Designer because of the way it was set up. There was a zone and, under it, another zone, and under that another one, and under that another one. There were a lot of levels and places inside the report, and it was a little bit complicated. You had to search all these different places using a mouse, clicking everywhere. The interface does not enable you to find things and manage all that. I don't know if other tools are better for end-users when it comes to the graphical interface, but this was a bit complicated. In the end, we were able to do everything with Pentaho.

    And when you want to improve the appearance of your report, Pentaho Report Designer has complicated menus. It is not very user-friendly. The result is beautiful, but it takes time.

    Also, each report is coded in a binary file, so you cannot read it. Maybe that's what the community or the developers want, but it is inconvenient because when you want to search for information, you need to open the graphical interface and click everywhere. You cannot search with a text search tool because the reports are coded in binary. When you have a lot of reports and you want to find where a precise part of one of your reports is, you cannot do it easily.

    The way you specify parameters in Pentaho Report Designer is a little bit complex. There are two interfaces. The job creators use the PDI which provides the ETL interface, and it's okay. Creating the jobs for extract/transform/load is simpler than in other solutions. But there is another interface for the end-users of Pentaho and you have to understand how they relate to each other, so it's a little bit complex. You have to go into XML files, which is not so simple.

    Also, using the solution overall is a little bit difficult. You need to be an engineer and somebody with a technical background. It's not absolutely easy, it's a technical tool. I didn't immediately understand it and had to search for information and to think about it.

    For how long have I used the solution?

    I used Hitachi Lumada Data Integration, Pentaho, for approximately two years.

    What do I think about the stability of the solution?

    The stability was perfect.

    What do I think about the scalability of the solution?

    I didn't scale the solution. I had to migrate from an old Pentaho to a new Pentaho. I had quite a big set of data, but I didn't add new data. I worked with the same volume of data all the time so I didn't test the scaling.

    In the company I consulted for, there were about 15 people who input the data and worked with the technical part of Pentaho. There were a lot of end-users, who were the people interested in the reports; on the order of several thousand end-users. 

    How are customer service and support?

    The technical support was okay. I used the open-source version of Pentaho and I used the forum. I found what I needed. And, the one or two times when I didn't find something, I asked a question in the forum and I received an answer very quickly. I appreciated that a lot. I had an answer one or two hours later. It's very good that somebody from Pentaho Enterprise responds so rapidly.

    How was the initial setup?

    The initial setup was complex, but I'm an engineer and it's my job to deal with complex systems. It's not the most complex that I have dealt with, but it was still somewhat complex. The procedure was explained on the Pentaho website in the documentation. You had to understand which module does what. It was quite complex.

    It took quite a long time because I had to troubleshoot, to understand what was wrong, and I had to do it several times before it worked.

    What's my experience with pricing, setup cost, and licensing?

    I didn't purchase Pentaho. There is a business version but I used only the open source. I was fully satisfied and very happy with it. It's a very good open-source solution. The communication channels, the updates, the patches, et cetera are all good.

    What other advice do I have?

    I would fully recommend Pentaho. I have already recommended it to some colleagues. It's a good product with good performance.

    Overall, I was very happy with it. It was complicated, but that is part of my job. I was happy with the result and the stability. The Data Integration product is simpler than the Report Designer. I would rate the Data Integration at 10 out of 10 and the Report Designer at nine, because of the graphical interface.

    Disclosure: My company has a business relationship with this vendor other than being a customer: System integrator
    PeerSpot user
    Anton Abrarov - PeerSpot reviewer
    Project Leader at a mining and metals company with 10,001+ employees
    Real User
    Fastens the data flow processes and has a user-friendly interface
    Pros and Cons
    • "It has a really friendly user interface, which is its main feature. The process of automating or combining SQL code with some databases and doing the automation is great and really convenient."
    • "As far as I remember, not all connectors worked very well. They can add more connectors and more drivers to the process to integrate with more flows."

    What is our primary use case?

    The company where I was working previously was using this product. We were using it for ETL process management. It was like a data flow automatization.

    In terms of deployment, we were using an on-premise model because we had sensitive data, and there were some restrictions related to information security.

    How has it helped my organization?

    Our data flow processes became faster with this solution.

    What is most valuable?

    It has a really friendly user interface, which is its main feature. The process of automating or combining SQL code with some databases and doing the automation is great and really convenient.

    What needs improvement?

    As far as I remember, not all connectors worked very well. They can add more connectors and more drivers to the process to integrate with more flows.

    The last time I saw this product, the onboarding instructions were not clear. If the process of onboarding this product is made more clear, it will take the product to the next level. There is a possibility that the onboarding process has already improved, and I haven't seen it. 

    For how long have I used the solution?

    I have used this solution for two or three years.

    What do I think about the stability of the solution?

    I would rate it an eight out of ten in terms of stability.

    What do I think about the scalability of the solution?

    We didn't have to scale too much. So, I can't evaluate it properly in terms of scalability.

    In terms of its users, only our team was using it. There were approximately 20 users. It was not for the whole company.

    How are customer service and support?

    We didn't use too much customer support. We were using the open-source resources through Google Search. So, we were just using text search. There were some helpful forums where we were able to find the answers to our questions.

    Which solution did I use previously and why did I switch?

    I didn't use any other solution previously. This was the only one.

    How was the initial setup?

    I wasn't a part of its deployment. In terms of maintenance, as far as I know, it didn't require much maintenance.

    What was our ROI?

    We absolutely saw an ROI. It was hard to calculate, but we felt it in terms of
    the speed of our processes. After using this product, we could do some of the things much faster than before.

    What's my experience with pricing, setup cost, and licensing?

    I mostly used the open-source version. I didn't work with a license.

    Which other solutions did I evaluate?

    I did not evaluate other options.

    What other advice do I have?

    I would recommend using this product for data engineering and Extract, Transform, and Load (ETL) processes.

    I would rate it an eight out of ten.

    Which deployment model are you using for this solution?

    On-premises
    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    PeerSpot user
    CDE & BI Delivery Manager at a tech services company with 501-1,000 employees
    Consultant
    Connects to different databases, origins of data, files, and SFTP
    Pros and Cons
    • "I can create faster instructions than writing with SQL or code. Also, I am able to do some background control of the data process with this tool. Therefore, I use it as an ELT tool. I have a station area where I can work with all the information that I have in my production databases, then I can work with the data that I created."
    • "I work with different databases. I would like to work with more connectors to new databases, e.g., DynamoDB and MariaDB, and new cloud solutions, e.g., AWS, Azure, and GCP. If they had these connectors, that would be great. They could improve by building new connectors. If you have native connections to different databases, then you can make instructions more efficient and in a more natural way. You don't have to write any scripts to use that connector."

    What is our primary use case?

    I just use it as an ETL. It is a tool that helps me work with data so I can solve any of my production problems. I work with a lot of databases. Therefore, I use this tool to keep information organized. 

    I work with a virtual private cloud (VPC) and VPN. If I work in the cloud, I use VPC. If I work on-premises, I work with VPNs.

    How has it helped my organization?

    I can create faster instructions than writing with SQL or code. Also, I am able to do some background control of the data process with this tool. Therefore, I use it as an ELT tool. I have a station area where I can work with all the information that I have in my production databases, then I can work with the data that I created.

    Right now, I am working in the business intelligence area. However, we use BI in all our companies. So, it is not only in one area. So, I create different data parts for different business units, e.g., HR, IT, sales, and marketing.

    What is most valuable?

    A valuable feature is the number of connectors that I have. So, I can connect to different databases, origins of data, files, and SFTP. With SQL and NoSQL databases, I can connect, put it in my instructions, send it to my staging area, and create the format. Thus, I can format all my data in just one process.

    What needs improvement?

    I work with different databases. I would like to work with more connectors to new databases, e.g., DynamoDB and MariaDB, and new cloud solutions, e.g., AWS, Azure, and GCP. If they had these connectors, that would be great. They could improve by building new connectors. If you have native connections to different databases, then you can make instructions more efficient and in a more natural way. You don't have to write any scripts to use that connector.

    Hitachi can make a lot of improvements in the tool, e.g., in performance or latency or putting more emphasis on cloud solutions or NoSQL databases. 

    For how long have I used the solution?

    I have more than 15 years of experience working with it.

    What do I think about the stability of the solution?

    The stability depends on the version. At the beginning, it was more focused on stability. As of now, some things have been deprecated. I really don't know why. However, I have been pretty happy with the tool. It is a very good tool. Obviously, there are better tools, but Pentaho is fast and pretty easy to use. 

    What do I think about the scalability of the solution?

    It is scalable. 

    How are customer service and support?

    Their support team will receive a ticket on any failures that you might have. We have a log file that lets us review our errors, both in Windows and Unix. So, we are able to check both operating systems.

    If you don't pay any license, you are not allowed to use their support at all. While I have used it a couple of times, that was more than 10 years ago. Now, I just go to their community and any Pentaho forums. I don't use the support.

    Which solution did I use previously and why did I switch?

    I have used a lot of ETL data integrators, such as DataStage, Informatica, Talend, Matillion, Python, and even SQL. MicroStrategy, Qlik, and Tableau have instructional features, and I try to use a lot of tools to do instructions. 

    How was the initial setup?

    I have built the solution. It does not change for cloud or on-premise developments. 

    You create in your development environments, then you move to test. After that, you do the volume and integrity testing, then you go to UAT. Finally, you move to production. It does depend on the customer. You can thoroughly create the entire product structure as well as all the files that you need. Once you put it in production, it should work. You should have the same structure in development, test, and production.

    What was our ROI?

    It is free. I don't spend money on it.

    It will reduce a lot of the time that you work with data.

    What's my experience with pricing, setup cost, and licensing?

    I use it because it is free. I download from their page for free. I don't have to pay for a license. With other tools, I have to pay for the licenses. That is why I use Pentaho.

    I used to work with the complete suite of Pentaho, not only Data Integration. I used to build some solutions from scratch. I used to work with the Community version and Enterprise versions. With the Enterprise version, it is more than building cubes. I am building a BI solution that I can explore. Every time that I use Pentaho Data Integration, I never spend any money because it comes free with the tool. If you pay for the Enterprise license, Pentaho Data Integration is included. If you don't pay for it and use the Community version, Data Integration is included for free. 

    Which other solutions did I evaluate?

    I used to work with a reseller of Pentaho. That is why I started working with it. Also, I did some training for Pentaho at the company that I used to work for in Argentina, where we were a Platinum reseller. 

    Pentaho is easy to use. You don't need to install anything. You can just open the script and start working on it. That is why I chose it. With Informatica, you need to do a server installation, but some companies might not allow some installation in their production or normal environment.

    I feel pretty comfortable using the solution. I have tried to use other tools, but I always come back to Pentaho because it is easier. 

    Pentaho is open source. While Informatica is a very good tool, it is pretty expensive. That is one of the biggest cons for the data team because you don't want to pay money for tools that just only help you to work.  

    What other advice do I have?

    I would rate this solution as eight out of 10. One of the best things about the solution is that it is free.

    I used to sell Pentaho. It has a lot of pros and cons. From my side, there are more pros than cons. There isn't one tool that can do everything that you need, but this tool is one of those tools that helps you to complete your tasks and it is pretty integrable with other tools. So, you can switch Pentaho on and off from different tools and operating systems. You can use it in Unix, Linux, Windows, and Mac.

    If you know how to develop different things and are very good at Java, you can create your own connectors. You can create a lot of things. 

    It is a very good tool if you need to work with data. There isn't a database that you can't manage with this tool. You can work with it and manage all the data that you want to manage.

    Which deployment model are you using for this solution?

    Hybrid Cloud
    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    PeerSpot user
    Buyer's Guide
    Download our free Pentaho Data Integration and Analytics Report and get advice and tips from experienced pros sharing their opinions.
    Updated: November 2024
    Product Categories
    Data Integration
    Buyer's Guide
    Download our free Pentaho Data Integration and Analytics Report and get advice and tips from experienced pros sharing their opinions.