It allows for rapid prototyping of a wide array of ETL workloads.
Senior Consultant at a financial services firm with 10,001+ employees
Needs improvement on the Hadoop and JMS plugins.
What is most valuable?
What needs improvement?
Support for common Hadoop utilities can be expanded, such as bulk load with composite row keys for HBase, and include drivers for Impala out-of-the-box. A richer interface to Hive could also be beneficial as we currently have to go through a raw connection and execute SQL scripts, for which some syntax is not respected.
As of version 6, there are also some new issues introduced that pose a bit of an annoyance:
1) On kettle's ramp up - log4j errors
2) IBM Websphere MQ Producer - variable substitution for the URL does not work - you have to hardcode.
3) shared.xml for DB connections - variable substitution for connection properties does not work - have to hardcode things like Kerberos principal for a Hive/Impala connection.
What was my experience with deployment of the solution?
We had no issues deploying it.
What do I think about the scalability of the solution?
The robustness of this solution in a production cluster (>30 nodes) remains to be seen.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Business Intelligence Consultant at Sanmargar Team
We use it almost everywhere, for creating data marts, data warehouses, and implementing BI reporting tools.
Valuable Features:
First of all, the ease of deployment. I’m pretty sure that almost anyone could do simple transformations without having any knowledge of IT. Thanks to its graphical interface this tool is just drag and click. Another advantage, is that it fits everywhere. You can connect it to Big Data sources, relational databases, and all types of files. If the developer missed something, you can try finding it in the marketplace or quickly develop it yourself, because it is opensource.
Improvements to My Organization:
We use it almost everywhere, for creating data marts, data warehouses, and implementing BI reporting tools. We also build our Customer Centralized File and Data Quality Studio using it. What’s more, we use it for small solutions too, i.e. if we want to quickly export data from database to .xlsx. We also develop our own plugins for PDI and put them into the marketplace.
Room for Improvement:
A big advantage, but also a problem, is that it is open source. Almost anyone can develop their own Pentaho code and release it. Now, Pentaho is a little messy, and some parts of it are super new and some look like it were developed at the beging. I think that developers should stop inventing new parts of it, and it can take a while to clean the code and optimize the older parts of it. Some old plugins, after a long time, still doesn’t work properly enough.
Use of Solution:
I've been using it for four years, and when I started using it I was in college. I quickly found that PDI with my text search analytic plug-in is useful for preparing notes for classes. When I was bored I came up with a funny tool. It was collecting data from all my roommates about what they need from shop and it was sending notifications to peoples phones who were going to the shop.
Deployment Issues:
We have never had any problems with deployment.
Stability Issues:
There are some with stability. As I said before there are some small bugs but it’s Pentaho you can always find workaround for it.
Scalability Issues:
With the Pentaho Community version you just download it, unpack, and it should be running. If not you should also install Java.
Customer Service:
Customer service isn’t needed. Every problem solution is on the internet. If not, you can post it to community forum and you will get an immediate answer, but I have never had to post a new topic.
Initial Setup:
Straightforward. You just need to unzip file and you can already run it. There is also some setup if you need. It’s very simple you just need to edit three files in notepad.
Implementation Team:
I did this myself and we do it for other companies. All installations are easy, and you do not need to be an IT magician.
Cost and Licensing Advice:
There is a Community Edition which is free. There is also an Enterprise licence but the price varies depending on the server hardware configuration and the purpose of use (BigData, Hadoop, etc.).
Other Solutions Considered:
I had the chance to test SAS Data Integration but I didn’t fall in love with it like I did with PDI. I think that PDI is easier to use and you can do much more with PDI than with SAS.
Other Advice:
The tool is excellent, and almost everyone can use it. You just need to take it out of the box and run. There is no limit to the application – you can do everything with it. However, it still has a lot of faults. Not every component runs as you wish to. Always look for solutions on the Internet. There are many problems and build transformations/jobs that are already fixed.
Disclosure: My company has a business relationship with this vendor other than being a customer: Company where I work Sanmargar Team is a reseller of this solution and a Pentaho partner in Poland.
Buyer's Guide
Pentaho Data Integration and Analytics
February 2025

Learn what your peers think about Pentaho Data Integration and Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: February 2025.
839,422 professionals have used our research since 2012.
Data Developer at a tech services company with 10,001+ employees
It is possible to understand how to develop an ETL solution even when using it for the first time.
What is most valuable?
- Pentaho Kettle has a very intuitive and easy to use graphical user interface (GUI)
- It is possible to understand how to develop an ETL solution even when using it for the first time
- The Community Edition is free and very efficient
- They have versions for Windows, Linux and Mac
- Large selection of options.
How has it helped my organization?
We have developed some complex ETL processes for some clients and they are very satisfied with the results.
What needs improvement?
They could improve the logging generator. Sometimes the error description is so generic that it is not possible to detect the problem.
For how long have I used the solution?
We've used it for three years.
What was my experience with deployment of the solution?
There were no issues with the deployment.
What do I think about the stability of the solution?
There were no issues with the stability.
What do I think about the scalability of the solution?
There have been no issues scaling it.
How are customer service and technical support?
I use the Community Edition without support or customer service. I recommend the Pentaho Community Forums for technical issues.
Which solution did I use previously and why did I switch?
I have used Informatica PowerCenter, which is an excellent solution. However, it´s not so easy to use as Pentaho kettle.
How was the initial setup?
The initial setup is straightforward. All you need to do is to download it, unzip the file into a folder and execute the Spoon.bat (for Windows) or Spoon.sh (for Linux) to start the graphical user interface (GUI).
What about the implementation team?
In-house. The implementation is very simple. Data developers will not encounter difficulties to implement ETL solutions.
What's my experience with pricing, setup cost, and licensing?
The community edition is free. If you need a full BI solution, I would recommend the enterprise edition.
What other advice do I have?
Pentaho Kettle is an excellent solution to implement ETL process.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Global Consultant - Big Data, BI, Analytics, DWH & MDM at a tech consulting company with 1,001-5,000 employees
It helps to connect to various data sources including all available databases.
Valuable Features:
It's an ETL Platform including Big Data enablement. It's the most easy to use, extend and deploy. It helps to connect to various data sources including all available databases.
We also use Pentaho Analyzer which is an ad-hoc analytics tool built on Mondrian OLAP server that enables the end user to slice and dice the data in various patterns.
Improvements to My Organization:
We Implement Pentaho for data warehouses and BI features for our various customers. No software can give as complete functionality for fulfilling end user requirements as Pentaho. As well as this, Pentaho offers a flexible platform which enables us to extend the tool to any of the end user's requirement.
Another impressive feature is the Big Data implementation/integration is very quick and simple without the need to write any code. This enabled our clients to get maximum ROI with in a short period.
Room for Improvement:
Pentaho Dashboard Designer - needs an improvement on the various features of the Dashboards, since there are CTools available and which help to fulfil the gaps, but it needs developers involvement. A full fledged Dashboard designer to perform all the functions of what we do in CDE/CDF would be a great improvement for Pentaho.
Build Process - an inbuilt build process would provide an advantage to migrate between DEV-QA-UAT-PROD, currently it is mostly performed manually.
Data Profiling - including data profiling as part of PDI would be a great improvement to the platform and helps customers to save a lot of effort/cost of data quality.
Use of Solution:
We are Pentaho Service Providers and have implemented more than 130 projects in Pentaho. We are not direct customers of Pentaho but we recommend Pentaho to our clients if it meets their requirements.
Deployment Issues:
We had no issues with the deployment.
Stability Issues:
There have been no stability issues.
Scalability Issues:
We have not had any issues scaling it for our customers.
Initial Setup:
It is quick and easy to implement.
Cost and Licensing Advice:
Pentaho is available both in Community (Free) and Enterprise Edition (Subscription based) depending upon your budget.
Other Advice:
One of the best feature to lookout in this platform is its flexibility in enhancing or adapting to your requirements. Implementation can be very quick, you can enable few dashboards and analytics to your organization in a week's time.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Sr BI Administrator at a healthcare company with 1,001-5,000 employees
It gave ‘out-of-the-box’ widgets for reading XML and Json interfaces which would otherwise have to be build from scratch.
What is most valuable?
It allows for very quick development due to the intuitive interface. Compared to other ETL tools like Powercenter, SSIS and SAS DI Studio it excels in rapid development cycles.
How has it helped my organization?
It gave ‘out-of-the-box’ widgets for reading XML and JSON interfaces which would otherwise have to be build from scratch.
What needs improvement?
PDI excels at the development part. Administration and monitoring are pretty weak and basic. But, I must say I have been spoiled with the great capabilities that Powercenter offers ‘out-of-the-box’ The Pentaho development team seems to rely very heavily on Linux/Unix for the admin part. Debugging could be enhanced with better feed-back.
For how long have I used the solution?
We used PDI 4.3 in a pilot against SSIS during 2013 for a couple of months. In 2014 I have the 4.4 version on a daily basis within a production environment for exactly one year. We also looked into the commercial front-end solution and found this to be too much of a collection of loosely connected applications
What was my experience with deployment of the solution?
There have been no deployment issues.
What do I think about the stability of the solution?
Stability is a bit of an issue. The GUI quite often ‘freezes’ and the is no alternative to killing the session. Very frequent saving is in order
What do I think about the scalability of the solution?
There have been no issues with scalability.
How are customer service and technical support?
The community site is pretty brilliant. Every technical component is handled on its own Wiki page. You can even look into the scrum backlog of the dev. team. Absolutely amazing.
Which solution did I use previously and why did I switch?
Heavy ETL solutions were simply too expensive and the SSIS alternative is simply too hidious to consider. It took at least three times as much time to develop the same ETL proces with SSIS as compared to Pentaho. (And having to deal with the abject Microsoft ‘debugging’.
How was the initial setup?
Incredibily easy. Just unpack, make sure you got the right drivers installed, and beware of other Java applications running.
What about the implementation team?
We simply did everything ourselves, with a little aid from the community.
What other advice do I have?
Make sure Pentaho solutions are still available as they were prior to the commercial take-over. Administration is not the best developed component . The ETL is brilliant. Make sure that the admin part is covered.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Project Lead at a tech services company with 10,001+ employees
The best benefit of the product is that it is easy to use and to understand.
Valuable Features:
The best benefit of the product is that it is easy to use and to understand.
Improvements to My Organization:
We have a huge amount of data that needs to be cleaned and made more valuable for our organization. This Data Integration helps us to achieve that goal.
Room for Improvement:
I have used multiple versions of this product. The initial version we were on was v3.2 and we were had multiple issues, but currently don't find any issues as a blocker. In general, it would be good if we could get better performance from this product.
Deployment Issues:
We haven't had any issues with deployment.
Stability Issues:
We haven't had any issues with stability except for those described in the Areas for Improvement.
Scalability Issues:
We haven't had any issues with scalability.
Other Advice:
There are other products out there, but I feel that this is the best one.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Brazil IT Coordinator at a transportation company with 1,001-5,000 employees
Integration between databases and data import for a BI solution is valuable.
Pros and Cons
- "Data transformation within Pentaho is a nice feature that they have and that I value."
- "I would like to see more improvements with AS400 DB2."
What is most valuable?
Data transformation within Pentaho is a nice feature that they have and that I value.
How has it helped my organization?
Integration between databases and data import for a BI solution.
What needs improvement?
I would like to see more improvements with AS400 DB2. I journalled the tables/instance and the data migration is too slow if I compare it with other databases.
What was my experience with deployment of the solution?
There were no issues with the deployment.
What do I think about the stability of the solution?
Until now, the stability of Pentaho is great. I've already tested various scenarios and I didn't feel a loss of performance.
What do I think about the scalability of the solution?
There have been no issues so far in scaling the product.
How was the initial setup?
I used self-learning to implement it and found that the tool is very easy to understand. For some things, I looked at YouTube videos for conceptual ideas during the planning phase.
What about the implementation team?
I did it myself.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Datawarehouse Administrator at a tech services company with 501-1,000 employees
We have been able to expose data services through the use of CDA relying on the same database as the reporting tools.
What is most valuable?
Its ability of blending data and the dashboarding with C*TOOLS for creating responsive single page apps.
How has it helped my organization?
We have been able to expose data services through the use of CDA relying on the same database as the reporting tools, thus avoiding inconsistencies among the data shown by reports and data acquired by external systems.
What needs improvement?
The User Console, aka workspace, and the development of dashboards. They work but they require some programmer skills. This means a continous application management on behalf of IT dept.
For how long have I used the solution?
I've used it for six years.
What was my experience with deployment of the solution?
There were issues, but they were solved with help from tech support.
What do I think about the stability of the solution?
There were issues, but they were solved with help from tech support.
What do I think about the scalability of the solution?
There were issues, but they were solved with help from tech support.
How are customer service and technical support?
It depends, as it takes usually a long time, and some answers are just a way to acquire time and the commitment seems poor. However, when you finally get to an engineer your are likely to have your problem solved in a few days.
Which solution did I use previously and why did I switch?
We used Microstrategy, Cognos, and Business Objects. The pricing was the key driver, but also the open source licensing which made us think we would have been able to develop on our own improvements. This didn't happen because primarily of the few resources we effectively put on development.
How was the initial setup?
It's complex because of the lack of documentation and the absence of an installer for Linux.
What about the implementation team?
We did it in-house one, and we had to hire some developers for some months with Java skills.
What other advice do I have?
Have a vision, and do not let yourself be guided by the technology.
Disclosure: I am a real user, and this review is based on my own experience and opinions.

Buyer's Guide
Download our free Pentaho Data Integration and Analytics Report and get advice and tips from experienced pros
sharing their opinions.
Updated: February 2025
Product Categories
Data IntegrationPopular Comparisons
Informatica Intelligent Data Management Cloud (IDMC)
Azure Data Factory
Informatica PowerCenter
Oracle Data Integrator (ODI)
Talend Open Studio
IBM InfoSphere DataStage
Oracle GoldenGate
Palantir Foundry
SAP Data Services
Alteryx Designer
Spring Cloud Data Flow
Buyer's Guide
Download our free Pentaho Data Integration and Analytics Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:
- Which ETL tool would you recommend to populate data from OLTP to OLAP?
- What do you think can be improved with Hitachi Lumada Data Integrations?
- What do you use Hitachi Lumada Data Integrations for most frequently?
- Is using Hitachi Lumada Data Integrations cost-effective? Did this solution save money for your company compared to other products?
- When evaluating Data Integration, what aspect do you think is the most important to look for?
- Microsoft SSIS vs. Informatica PowerCenter - which solution has better features?
- What are the best on-prem ETL tools?
- Which integration solution is best for a company that wants to integrate systems between sales, marketing, and project development operations systems?
- Experiences with Oracle GoldenGate vs. Oracle Data Integrator?
- Should we choose Data Hub or GoldenGate?
- You could try multithreading feature of Pentaho to increase the performance, also there are lot many options available by using we can improve the performance of the Pentaho jobs and transformations,