What needs improvement with Pentaho Data Integration?

Learn what your peers think about Pentaho Data Integration and Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: March 2025.

DOWNLOAD NOW

845,406 professionals have used our research since 2012.

Pentaho Data Integration and Analytics

53 Reviews

Pentaho Data Integration stands as a versatile platform designed to cater to the data integration and analytics needs of organizations, regardless of their size. This powerful solution is the go-to choice for businesses seeking to seamlessly integrate data from diverse sources, including databases, files, and applications. Pentaho Data Integration facilitates the essential tasks of cleaning and transforming data, ensuring it's primed for meaningful analysis. With a wide array of tools...

Download Pentaho Data Integration and Analytics Report Read more

Related Q&As

Dec 6, 2022

What do you think can be improved with Hitachi Lumada Data Integrations?

Dec 6, 2022

What do you use Hitachi Lumada Data Integrations for most frequently?

Jefferson Hernandez Data Architecture and Engineering Specialist at coprocenva · Answer 1 · 2024-12-03T22:32:00Z

While Pentaho Data Integration is very friendly, it is not very useful when there isn't a lot of data to handle. Communicating with the vendor is challenging, and this hinders its performance in free tool setups.

Aqeel UR Rehman BI Analyst at a computer software company with 51-200 employees · Answer 2 · 2024-11-28T04:19:27Z

I experience difficulties when handling millions of rows, as the data movement from one source to another becomes challenging. The processing speed slows down significantly, especially when using a table output for Redshift. The availability of Python code integration as an inbuilt function would be beneficial.

MARIA PILAR CANDA Assosiate Partner at Autana Business Partners · Answer 3 · 2024-09-20T07:09:00Z

Pentaho may be less efficient for large volumes of data compared to other solutions like Talend or Informatica. Larger data jobs take more time to execute. Pentaho is more appropriate for jobs with smaller volumes of data.

KrishnaBorusu Senior Product Manager at a retailer with 10,001+ employees · Answer 4 · 2024-07-24T06:32:00Z

It is difficult to process huge amounts of data. We need to test it end-to-end and conclude how much is the processing of data. If it is an enterprise edition, we can process the data.

Ahad Ahmed BI developer at Jubilee Life Insurance Company Ltd · Answer 5 · 2024-05-27T07:42:00Z

The solution should provide additional control for the data warehouse and reduce its size, as our organization's clients have expressed concerns regarding it. The vendor can focus on reducing capacity and compensate for it by enhancing product efficiency.

Eric Smets System Engineer at a tech services company with 11-50 employees · Answer 6 · 2022-09-04T22:17:00Z

I would like to see better support from one version to the next, and all the more so if there are third-party elements that you are using. That's one of the differences between the Community Edition and the Enterprise Edition. In addition to better integration with third-party tools, what we have seen is that some of the tools just break from one version to the next and aren't supported anymore in the Community Edition. What is behind that is not really clear to us, but the result is that we can't migrate, or we have to migrate to other parts. That's the most inconvenient part of the tool. We need to test to see if all our third-party plugins are still available in a new version. That's one of the reasons we decided we would move from the tool to the completely open-source version for the ETL part. That's one of the results of the migration hassle we have had every time. The support for the Enterprise Edition is okay, but what they have done in the last three or four years is move more and more things to that edition. The result is that they are breaking the Community Edition. That's what our impression is. The Enterprise Edition is okay, and there is a clear path for it. You will not use a lot of external plugins with it because, with every new version, a lot of the most popular plugins are transferred to the Enterprise Edition. But the Community Edition is almost not supported anymore. You shouldn't start in the Community Edition because, really early on, you will have to move to the Enterprise Edition. Before, you could live with and use the Community Edition for a longer time.

Ridwan Saeful Rohman Data Engineering Associate Manager at Zalora Group · Answer 7 · 2022-06-26T13:19:00Z

Five years ago, I was confident that I would use this product more than Airflow, as it will be easier for me with the abstraction being quite intuitive. Five years ago, I would choose the product over the other tools using pure scripting as it would reduce most of my time in terms of developing ETL tools. This isn't the case anymore. When I first joined my organization, I was still using Windows and it is quite a step forward to develop the ETL system on it. However, when I changed my laptop to MacBook, it was quite a hassle. When we tried to open the application, we had to open the terminal first, go to the solution's directory, and then run the executable file. Therefore, if you develop it on MacBook, it'll be quite a hassle, however, when you develop it on Windows, it's not really different from other ETL tools on the market, like the SQL Server Integration Services, Informatica, et cetera.

score 0 · Answer 8 · 2022-05-30T16:19:00Z

reviewer1872000

Senior Data Analyst at a tech services company with 51-200 employees

Real User

May 30, 2022

Parallel execution could be better in Pentaho. It's very simple but I don't think it works well.

score 0 · Answer 9 · 2022-05-25T17:24:00Z

It could be better integrated with programming languages, like Python and R. Right now, if I want to run a Python code on one of my ETLs, it is a bit difficult to do. It would be great if we have some modules where we could code directly in a Python language. We don't really have a way to run Python code natively.

RicardoDíaz COO / CTO at a tech services company with 11-50 employees · Answer 10 · 2022-05-19T16:25:00Z

Their client support is very bad. It should be improved. There is also not much information on Hitachi forums or Hitachi web pages. It is very complicated. In terms of the flexibility to deploy in any environment, such as on-premise or in the cloud, we can do the cloud deployment only through virtual machines. We might also be able to work on different environments through Docker or Kubernetes, but we don't have an Azure app or an AWS app for easy deployment to the cloud. We can only do it through virtual machines, which is a problem, but we can manage it. We also work with Databricks because it works with Spark. We can work with clustered servers, and we can easily do the deployment in the cloud. With a right-click, we can deploy Databricks through the app on AWS or Azure cloud.

score 0 · Answer 11 · 2022-05-11T06:00:00Z

KM

Krisjanis Muskars

Data Architect at a tech services company with 1,001-5,000 employees

Reseller

May 11, 2022

I would like to see support for some additional cloud sources - Azure, Snowflake.

score 0 · Answer 12 · 2022-05-11T04:07:00Z

As far as I remember, not all connectors worked very well. They can add more connectors and more drivers to the process to integrate with more flows. The last time I saw this product, the onboarding instructions were not clear. If the process of onboarding this product is made more clear, it will take the product to the next level. There is a possibility that the onboarding process has already improved, and I haven't seen it.

score 0 · Answer 13 · 2022-05-10T13:38:00Z

I would like to see improvement when it comes to integrating structured data with text data or anything that is unstructured. Sometimes we get all kinds of different files that we need to integrate into the warehouse. By using some of the Python scripts that we have, we are able to extract all this text data into JSON. Then, from JSON, we are able to create external tables in the cloud whereby, at any one time, somebody has access to this data on the S3 drive.

Rodrigo Vazquez CDE & BI Delivery Manager at Caylent · Answer 14 · 2022-05-02T05:34:00Z

I work with different databases. I would like to work with more connectors to new databases, e.g., DynamoDB and MariaDB, and new cloud solutions, e.g., AWS, Azure, and GCP. If they had these connectors, that would be great. They could improve by building new connectors. If you have native connections to different databases, then you can make instructions more efficient and in a more natural way. You don't have to write any scripts to use that connector. Hitachi can make a lot of improvements in the tool, e.g., in performance or latency or putting more emphasis on cloud solutions or NoSQL databases.

score 0 · Answer 15 · 2022-04-12T20:49:00Z

There is no straight-line explanation about bugs and errors that happen on the software. I must search heavily on the Internet, some YouTube videos, and other forums to know what is happening. The proper site of Hitachi and Lumada doesn't have the best explanation about bugs, errors, and the functions. I must search for other sources to understand what is happening. Usually, it is some guy in India or Russia who knows the answer. A big problem after deploying something that we do in Lumada is with Git. You get a binary file to do a code review. So, if you need to do a review, you have to take pictures of the screen to show each step. That is the biggest bug if you are using Git. After you create a data pipeline, if you could make a JSON file or something with another language, we could simplify the steps for creating what we are doing. Or, a simple flat file of text could be even better than that, but generated by their own platform so people can look and see what is happening. You shouldn't need to download the whole project in your own Pentaho, I would like to just look at the code and see if there is something wrong. When I use it for open-source applications, it doesn't handle big data too well. Therefore, we have to use other kinds of technologies to manage that. I would like it more accessible for Macs. Previously, I always used Linux, but some companies that I worked for before used MacBooks. It would be good if I could use Pentaho in that too since I need to use other tools or create a virtual machine to use Pentaho. So, it would be pretty good if the solution had a friendly version for Macs or Linux-based programs, like Ubuntu.

José Orlando Maia Data Engineer at a tech vendor with 1,001-5,000 employees · Answer 16 · 2022-04-11T18:31:00Z

Lumada could have more native connectors with other vendors, such as Google BigQuery, Microsoft OneDrive, Jira systems, and Facebook or Instagram. We would like to gather data from modern platforms using Lumada, which is a better approach. As a comparison, if you open Power BI to retrieve data, then you can get data from many vendors with cloud-native connectors, such as Azure, AWS, Google BigQuery, and Athena Redshift. Lumada should have more native connectors to help us and facilitate our job in gathering information from these new modern infrastructures and tools.

Jacopo Zaccariotto Head of Data Engineering at InfoCert · Answer 17 · 2022-04-05T10:31:00Z

It's difficult to use custom code. Implementing a pipeline with pre-built blocks is straightforward, but it's harder to insert custom code inside the pre-built blocks. The web interface is rusty, and the biggest problem with Pentaho is debugging and troubleshooting. It isn't easy to build the pipeline incrementally. At least in our case, it's hard to find a way to execute step by step in the debugging mode. Repository management is also a shortcoming, but I'm not sure if that's just a limitation of the free version. I'm not sure if Pentaho can use an external repository. It's a flat-file repository inside a virtual machine. Back in the day, we would want to deploy this repository on a database. Pentaho's data management covers ingestion and insights but I'm not sure if it's end-to-end management—at least not in the free version we are using—because some of the intermediate steps are missing, like data cataloging and data governance features. This is the weak spot of our Pentaho version.

Ryan Ferdon Senior Data Engineer at Burgiss · Answer 18 · 2022-03-24T15:23:00Z

If you're working with a larger data set, I'm not so sure it would be the best solution. The larger things got the slower it was. It was kind of buggy sometimes. And when we ran the flow, it didn't go from a perceived start to end, node by node. Everything kicked off at once. That meant there were times when it would get ahead of itself and a job would fail. That was not because the job was wrong, but because Pentaho decided to go at everything at once, and something would process before it was supposed to. There were nodes you could add to make sure that, before this node kicks off, all these others have processed, but it was a bit tedious. There were also caching issues, and we had to write code to clear the cache every time we opened the program, because the cache would fill up and it wouldn't run. I don't know how hard that would be for them to fix, or if it was fixed in version 10. Also, the UI is a bit outdated, but I'm more of a fan of function over how something looks. One other thing that would have helped with Pentaho was documentation and support on the internet: how to do things, how to set up. I think there are some sites on how to install it, and Pentaho does have a help repository, but it wasn't always the most useful.

score 0 · Answer 19 · 2022-03-06T11:05:00Z

I was not happy with the Pentaho Report Designer because of the way it was set up. There was a zone and, under it, another zone, and under that another one, and under that another one. There were a lot of levels and places inside the report, and it was a little bit complicated. You had to search all these different places using a mouse, clicking everywhere. The interface does not enable you to find things and manage all that. I don't know if other tools are better for end-users when it comes to the graphical interface, but this was a bit complicated. In the end, we were able to do everything with Pentaho. And when you want to improve the appearance of your report, Pentaho Report Designer has complicated menus. It is not very user-friendly. The result is beautiful, but it takes time. Also, each report is coded in a binary file, so you cannot read it. Maybe that's what the community or the developers want, but it is inconvenient because when you want to search for information, you need to open the graphical interface and click everywhere. You cannot search with a text search tool because the reports are coded in binary. When you have a lot of reports and you want to find where a precise part of one of your reports is, you cannot do it easily. The way you specify parameters in Pentaho Report Designer is a little bit complex. There are two interfaces. The job creators use the PDI which provides the ETL interface, and it's okay. Creating the jobs for extract/transform/load is simpler than in other solutions. But there is another interface for the end-users of Pentaho and you have to understand how they relate to each other, so it's a little bit complex. You have to go into XML files, which is not so simple. Also, using the solution overall is a little bit difficult. You need to be an engineer and somebody with a technical background. It's not absolutely easy, it's a technical tool. I didn't immediately understand it and had to search for information and to think about it.

Dale Bloom Credit Risk Analytics Manager at MarketAxess · Answer 20 · 2022-01-20T22:28:00Z

I haven't been able to broach all the functionality of the Enterprise edition because it hasn't been integrated into our server. We're still building out the server, app server, and repository to support it. In the Community edition, it would be nice to have more modules that allow you to code directly within the application. It could have R or Python completely integrated into it, but this could also be because I'm using an older version.

reviewer1751571 Systems Analyst at a university with 5,001-10,000 employees · Answer 21 · 2021-12-22T20:41:00Z

The transition to the web-based solution has taken a little longer and been more tedious than we would like and it's taken away development efforts towards the reporting side of the tool. They have a reporting tool called Pentaho Business Analytics that does all the report creation based on the data integration tool. There are a lot of features in that product that are missing because they've allocated a lot of their resources to fixing the data integration, to make it more web-based. We would like them to focus more on the user interface for the reporting. The reporting definitely needs improvement. There are a lot of general, basic features that it doesn't have. A simple feature you would expect a reporting tool to have is the ability to search the repository for a report. It doesn't even have that capability. That's been a feature that we've been asking for since the beginning and it hasn't been implemented yet. We have between 500 and 800 reports in our system now. We've had to maintain an external spreadsheet with IDs to identify the location of all of those reports, instead of having that built into the system. It's been frustrating for us that they can't just build a simple search feature into the product to search for report names. It needs to be more in line with other reporting tools, like Tableau. Tableau has a lot more features and functions. Because the reporting is lacking, only the deans and above are using it. It could be used more, and we'd like it to be used more. Also, while the solution provides us with a single, end-to-end data management experience from ingestion to insights, it does but it doesn't give us a full history of where it's coming from. If we change a field, we can't trace it through from the reporting to the ETL field. Unfortunately, it's a manual process for us. Hitachi has a new product to do that and it searches all the fields, documents, and files just to get your pipeline mapped, but we haven't bought that product yet.

Tracy Gettings Analytics Team Leader at HealtheLink · Answer 22 · 2021-12-22T20:35:00Z

Since Hitachi took over, I don't feel that the documentation is as good within the solution. It used to have very good help built right in. There's good documentation when you go to the site but the help function within the solution hasn't been as good since Hitachi took over.

score 0 · Answer 23 · 2021-12-14T21:23:00Z

Some of the scheduling features about Lumada drive me buggy. The one issue that always drives me up the wall is when Daylight Savings Time changes. It doesn't take that into account elegantly. Every time it changes, I have to do something. It's not a big deal, but it's annoying. That's the one issue, but I see the limitation, and it might not be easily solvable.

score 0 · Answer 24 · 2021-12-13T16:49:00Z

Although it is a low-code solution with a graphical interface, often the error messages that you get are of the type that a developer would be happy with. You get a big stack of red text and Java errors displayed on the screen, and less technical people can get intimidated by that. It can be a bit intimidating to get a wall of red error messages displayed. Other graphical tools that are focused at the power user level provide a much more user-friendly experience in dealing with your exceptions and guiding the user into where they've made the mistake. Sometimes, there are so many options in some of the components. Some of the guidance about when to use certain options embedded into the interface would be good so that people know that if they set something, what would it do, and when should they use an option. It is quite light on that aspect.

Oscar Mejia IT-Services Manager & Solution Architect at Stratis · Answer 25 · 2021-07-14T17:56:24Z

The solution needs better, higher-quality documentation, similar to AWS. Right now, we find that although documentation exists, it's not easy to find the answers we seek. I have tried some cloud services with the ETL, so perhaps that would be good to add. The product needs more plugins. Right now, it just has a standard database connection and there are other solutions there that can have straightforward connections for Oracle, MySQL, and stuff like that. However, more plugins would make it a much better product.

score 0 · Answer 26 · 2021-02-22T14:48:00Z

I don't think they market it that well. We can make suggestions for improvements but they don't seem to take the feedback on board. This contrasts with Informatica who are really helpful and seem to listen more to their customer feedback. I would also really like to see improved data capture. At the moment the emphasis seems to be on data processing. I would like to see a real-time processing data integration tool. This would provide instant reporting whenever the data changes. I'm still in the very recent stage concerning Pentaho Data Integration, but it can't really handle what I describe as "extreme data processing" i.e. when there is a huge amount of data to process. That is one area where Pentaho is still lacking.

ABDULGAFFAR Assistant General Manager at DTDC Express Limited · Answer 27 · 2021-01-08T09:43:51Z

The shortcoming in version 7 is that we are unable to connect to Google Cloud Storage (GCS), where I can write the results from Pentaho. I'm able to connect to S3 using Pentaho 8, but when using it for GCS, I'm unable to connect. With people moving from on-premises deployments to the cloud, be it S3, Azure, or Google, we need a plugin where we can interact with these cloud vendors. I would like to see improvements made for real-time data processing. It is something that I will be looking out for.

score 0 · Answer 28 · 2020-07-15T07:11:00Z

I'm currently looking at a new competitor that's got some interesting features that this solution doesn't have. I have found this competitor has a feature breaking system that is not present in the Pentaho Data Integration approach. The way the system sets can somehow maintain a track for the last executions and somehow store the state which gives you the potential to run from the point that it ended the last time. It's very interesting. It would be nice if Pentaho had this type of feature. Often you are required to install plugins. If you need to have access to, in my case, Neo4j databases new folder databases, you do need a plugin to do it.