The most valuable thing for me is that it enables a technical product manager to be able to write ETL jobs themselves, which saves developers time so that they can do more important things.
Senior Data Engineer at a tech company with 501-1,000 employees
It enables a technical product manager to be able to write ETL jobs themselves.
What is most valuable?
How has it helped my organization?
Now developers focus on improving it as a tool (since it's open source) and teach Project Managers about it. The Project Managers are the ones responsible for their own ETL jobs as they know what they want, so hence it's best for them to manage their own jobs.
What needs improvement?
Its performance can be improved so it will work better with Big Data. Also, sometimes it can be very buggy which keeps away some potential users.
For how long have I used the solution?
I've used it for two years.
Buyer's Guide
Pentaho Data Integration and Analytics
February 2025

Learn what your peers think about Pentaho Data Integration and Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: February 2025.
839,422 professionals have used our research since 2012.
What was my experience with deployment of the solution?
We have had no issues with the deployment.
What do I think about the stability of the solution?
The performance for Big Data needs to be improved.
What do I think about the scalability of the solution?
We have had no issues scaling it for our needs.
How are customer service and support?
There is a community that can support limited technical help. I'll give a 6 to the community since it's not very active.
Which solution did I use previously and why did I switch?
It was already in place when I joined the company.
How was the initial setup?
It's very easy to install.
What about the implementation team?
We did it in-hous. It's worth it to have someone in your company who knows Pentaho really well.
What was our ROI?
ROI is pretty good since it is kind of a major thing in our company.
What's my experience with pricing, setup cost, and licensing?
The only cost is the time it takes for the developer to get to know it.
What other advice do I have?
If your ETL jobs are small and straightforward, then this solution is definitely worth it.
Disclosure: My company has a business relationship with this vendor other than being a customer: The company is also contributing back to the open source project.

Pentaho Consultant at a comms service provider with 10,001+ employees
It is an open source product it is very easy to build your own solution against it.
What is most valuable?
It is a very good open source ETL tool that's capable of connecting to most databases. It has a lot of functions that makes transforming the data very easy. Also, because it is an open source product, it is very easy to build your own solution with it.
How has it helped my organization?
It is also possible to build a new solution quit quick so the customer sees results quite fast.
What needs improvement?
In the community version the scheduling tool is not good, and we had to build it ourselves.
For how long have I used the solution?
I have worked with different versions of Pentaho since 2009.
What was my experience with deployment of the solution?
There are a couple of bugs in the newer versions. We were forced to wait until those bugs were fixed before we could upgrade.
What do I think about the stability of the solution?
There were no issues with its stability.
What do I think about the scalability of the solution?
There have been no issues scaling it.
How are customer service and technical support?
Because we use the community edition, there is no support from the vendor. When I worked with the Enterprise edition last year the technical support was quick and to the point. I was more than happy with their knowledge.
Which solution did I use previously and why did I switch?
In the past I also worked with SAP BI. The main reason we switched to Pentaho was the cost of SAP. Because of the flexibility of Pentaho, I prefer to work with it.
How was the initial setup?
When I started using Pentaho in 2009 the initial setup was quit complex, mainly because of a lack of good documentation at that time. Since then, it has dramatically improved. Also the community on the web is quit active and there are some good blogs.
What about the implementation team?
I was hired to do the implementation. I think it is necessary to have a good understanding of the product to implement is well so I would recommend, when not in-house, to hire the appropriate knowledge
What other advice do I have?
When you don’t have the knowledge of the product I would recommend to follow some courses in to speed up the learning curve. A cheap way to start with Pentaho is using the Community Edition. You can do almost everything with it and the purchase of the Enterprise Edition is not necessary
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Pentaho Data Integration and Analytics
February 2025

Learn what your peers think about Pentaho Data Integration and Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: February 2025.
839,422 professionals have used our research since 2012.
COO at a tech services company with 11-50 employees
For me, it's the best ETL tool in the world
What is most valuable?
Easy to use, support for all databases (jdbc and odbc connection), xls , csv, files, txt, SAS, R
How has it helped my organization?
Integrate all datasources in one OLTP or OLAP database
For how long have I used the solution?
4 years
What was my experience with deployment of the solution?
None
What do I think about the stability of the solution?
None
What do I think about the scalability of the solution?
None
How are customer service and technical support?
Customer Service: 5/10Technical Support: 10/10
Which solution did I use previously and why did I switch?
Talend Studio.
How was the initial setup?
Easy
What was our ROI?
100% (PDI CE)
Which other solutions did I evaluate?
Talend Studio
Disclosure: My company has a business relationship with this vendor other than being a customer: EspriSûr Consultants
Graduate Teaching Assistant with 1,001-5,000 employees
We can perform transformations with data very quickly, and create reports indicating the KPI in the reporting tool.
Valuable Features:
The most valuable feature is that it can take inputs from all formats, e.g. CSV, text, Excel, JSON, Hadoop, etc. It has the potential to provide the output in the format we require, and we can also use many database connections. The transformations listed are also very useful and are very self-explanatory.
Also, the data mining feature which comes with the Pentaho business analytics suite was very useful to our project, especially the Weka plugin. We could score the records in the data warehouse, which helped in predicting the values.
Lastly, the GUI is very easy to use, so we can perform transformations with data very quickly, and create reports indicating the KPI in the reporting tool. I think that a company wouldn't need to spend more money on getting an experienced person to use this tool. All you need is a balance of experienced users and new trainees to get going. You can also start using the business analytics tool once you have integrated data. Coaching and applying this technology enterprise wide will enable your business to take data driven decisions.
Improvements to My Organization:
It makes it possible for the seniors to train new employees and junior staff very quickly. All that is needed is strong knowledge of ETL and BI/Big Data concepts to use this software.
Room for Improvement:
I would like to see the data visualization tool combined with BI so I can see how data is progressing through various stages. I do think that they are working on this already. I also found, in my case, that the statistical data input wasn't working (.sas7bdat input wasn't working).
Deployment Issues:
There have been no issues with the deployment.
Stability Issues:
It could have been the case that I may not have been doing it the right way.
Scalability Issues:
We have had no issues scaling it.
Cost and Licensing Advice:
I would say it is one of the most affordable tools to use for business intelligence.
Other Advice:
You should go for this tool to manage your data warehouse, but I would suggest that you look for other reporting tools, such as Tableau, which are more user friendly and provide great insights in the data.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Data Architect & ETL Lead at a financial services firm with 1,001-5,000 employees
It doesn't have the capability to produce crosstab reports with formatting capabilities. It connects seamlessly to most commonly used data sources.
Valuable Features
It is a lightweight ETL tool that's easy to get started on. It connects seamlessly to most commonly used data sources.
Improvements to My Organization
The organization went with Pentaho ETL and Reporting solutions as cost effective products, as compared to competitors. The ETL part certainly met those objectives, along with serving the purpose.
Room for Improvement
Since there have already been newer versions, maybe some of these features are already fixed now. The most troublesome missing feature was the capability to produce crosstab reports with formatting capabilities in the BI Reporting product. The one annoyance that troubled us a lot was the fact that every step in a transformation that needed data, created its own data connection. With some data sources like Greenplum, this was a problem, because they have a limit on available number of connections.
Use of Solution
I used it for three years, from 2012 to 2015, and only stopped as I left the organization.
Deployment Issues
One issue with encountered constantly with PDI deployments was that the environment parameters for jobs had to be updated manually through the designer module 'Spoon'. Although the product has a feature of keeping Environment Variables outside Spoon, that didn't work for us, as we had one Development server used for Dev, QA and UAT.
Stability Issues
There were no issues with the stability.
Scalability Issues
We had no issues scaling it across the company as needed.
Customer Service and Technical Support
It's about average. Most of the help we got was through Google searches and Wiki pages. One time we had an issue with a feature - our version of PDI could not handle microseconds. The product owner came up with a solution, but instead of applying the patch, wanted to sell it to us for a fee.
Initial Setup
I am only aware of the client side setup which was simple enough. It was pretty much a one step installation process.
Implementation Team
It was done by an in-house team. A couple of issues we realized later were regarding memory configuration for the environment. This needs to be evaluated and fine tuned otherwise you can run into job failures with large amount of data. We ran into this issue with 'Commit' points and 'Sort' steps.
Other Solutions Considered
There was an evaluation performed, however I was not involved in it.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
IT-Services Manager & Solution Architect at Stratis
Free to use, easy to set up, and has great UI
Pros and Cons
- "It's my understanding that the product can scale."
- "The product needs more plugins."
What is our primary use case?
We basically receive information from our clients via Excel. We take this information and transform it in order to create some data marks.
With this information, on these processes we are running right now, we receive new data every day. The solution processes the Excels and creates a data mark for them.
While we read the data and transform it as well as put it in a database, in order to explore the information, we need an analytics solution for that - and that is typically Microsoft's solution, Power BI.
What is most valuable?
Running itself with the ETL was very fast. It makes it so that it is very easy to transform the information we have. We found that very useful.
The UI is very easy to understand and learn.
The solution offers lots of documentation.
The initial setup is easy.
It's my understanding that the product can scale.
We've found the solution to be stable.
The product is free to use if you choose the free version.
What needs improvement?
The solution needs better, higher-quality documentation, similar to AWS. Right now, we find that although documentation exists, it's not easy to find the answers we seek.
I have tried some cloud services with the ETL, so perhaps that would be good to add.
The product needs more plugins. Right now, it just has a standard database connection and there are other solutions there that can have straightforward connections for Oracle, MySQL, and stuff like that. However, more plugins would make it a much better product.
For how long have I used the solution?
We recently finished two projects with Pentaho.
What do I think about the stability of the solution?
The product is stable. There are no bugs or glitches. It doesn't crash or freeze. It's reliable.
What do I think about the scalability of the solution?
According to the documentation, it's quite scalable. That said, I haven't tried to expand it. We just use a single server and that's all we need right now. We don't have plans to increase usage.
We have three people who use the solution currently.
How are customer service and technical support?
We don't really use support. We tend to do everything on our own and solve any problems we have ourselves. We basically have just read the manuals and that's about it.
How was the initial setup?
The initial setup is not complex or difficult. It's straightforward.
The deployment process takes about two weeks.
We had two people who handled the deployment process. They were an AWS DevOps person and a Pentaho expert.
What's my experience with pricing, setup cost, and licensing?
We do not pay any license costs. We use a free version of the product.
What other advice do I have?
I'm a consultant and an end-user.
I downloaded the latest version of the solution. I can't speak to the version number.
I'd rate the solution at an eight out of ten.
Which deployment model are you using for this solution?
Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Project Manager - Business Intelligence at www.datademy.es
It has improved our data integration capabilities
Pros and Cons
- "It has improved our data integration capabilities."
- "Provides a good open source option."
- "There is not a data quality or MDM solution in the Pentaho DI suite."
- "I could not connect to our Hadoop environment in an easy and flexible way, and it was important to scale our data warehouse."
- "I work with the Community Edition, therefore I do not have support. There was an issue that I could not resolve with community support."
How has it helped my organization?
Developed ETL processes to load a data warehouse. Has improved our data integration capabilities.
What is most valuable?
- Easy to use
- Development of the product
- A lot of predefined steps
- Good open source option
What needs improvement?
There is not a data quality or MDM solution in the Pentaho DI suite.
For how long have I used the solution?
Three to five years.
What do I think about the stability of the solution?
No issues.
What do I think about the scalability of the solution?
I could not connect to our Hadoop environment in an easy and flexible way, and it was important to scale our data warehouse.
How are customer service and technical support?
I work with the Community Edition, therefore I do not have support. There was an issue that I could not resolve with community support.
Which solution did I use previously and why did I switch?
I switched from our previous solution for cost reasons.
How was the initial setup?
It was not complex.
What's my experience with pricing, setup cost, and licensing?
There is a good open source option (Community Edition).
Which other solutions did I evaluate?
No.
What other advice do I have?
There is a lack of support if you work with the Community Edition.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Consultant at a comms service provider with 11-50 employees
Simple to install and simple to use and helps us mine, clean, and arrange terabytes of data
Pros and Cons
- "It's very simple compared to other products out there."
- "One thing that I don't like, just a little, is the backward compatibility."
What is most valuable?
It's very simple compared to other products out there.
How has it helped my organization?
We use Pentaho for data integration, but also PI to implement data mining. That has improved the intelligence behind the data. So, we are able to provide our customer with the ability to understand their data. Our customer produces terabytes of data, so arranging the data, cleaning the data, on data integration, aided our customer to understand the data to improve their business.
What needs improvement?
One thing that I don't like, just a little, is the backward compatibility. I used Pentaho from version 4, and version 6 does not work with the whole ETL design. So backward compatibility is a problem.
For how long have I used the solution?
I have worked with this product for seven years.
What do I think about the stability of the solution?
It's a stable product. In fact, contains some mocks, where you can write your own Java software, and do an ETL, specific for your needs.
How is customer service and technical support?
The support is very fast, but there are also a lot of forums to address problems, so you can find the solution to your issue easily. There is also the possibility to buy support, and when we bought support they resolved our problem in 24 hours.
How was the initial setup?
It was very, very simple. I copied the integration folder, started the tool to design the ETL, and it worked. Time was required to design the ETL, just to understand how each block works. So, when you understand how each block works, you need spend no more time to use the product.
Which other solutions did I evaluate?
Before using Pentaho, I analyzed other products to understand what is the best ETL product. I tested Talend and Oracle Data Integrator. Oracle Data Integrator is a little bit more difficult to understand, how it works.
So, I preferred Pentaho Data Integration because you just have to drag and drop the block, draw a line to connect the block, write the query, and connect to the DB. There's nothing else you need to do. For Oracle Data Integrator, and also for Talend, you spend more time installing the product. By contrast, with Pentaho, you just have to copy the folder, launch the product, and then you just need the Java machine and it works.
What other advice do I have?
When you start to use this product, if you have just a little experience and know about ETL, you will have to spend little time to learn the it. The product is very, very simple to understand. You can build functionality by yourself.
Anyone thinking about an ETL product, if they want high productivity on data cleaning and data movement, Pentaho Data Integration, in my opinion, is the best tool.
Disclosure: I am a real user, and this review is based on my own experience and opinions.

Buyer's Guide
Download our free Pentaho Data Integration and Analytics Report and get advice and tips from experienced pros
sharing their opinions.
Updated: February 2025
Product Categories
Data IntegrationPopular Comparisons
Informatica Intelligent Data Management Cloud (IDMC)
Azure Data Factory
Informatica PowerCenter
Oracle Data Integrator (ODI)
Talend Open Studio
IBM InfoSphere DataStage
Oracle GoldenGate
Palantir Foundry
SAP Data Services
Alteryx Designer
Spring Cloud Data Flow
Buyer's Guide
Download our free Pentaho Data Integration and Analytics Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:
- Which ETL tool would you recommend to populate data from OLTP to OLAP?
- What do you think can be improved with Hitachi Lumada Data Integrations?
- What do you use Hitachi Lumada Data Integrations for most frequently?
- Is using Hitachi Lumada Data Integrations cost-effective? Did this solution save money for your company compared to other products?
- When evaluating Data Integration, what aspect do you think is the most important to look for?
- Microsoft SSIS vs. Informatica PowerCenter - which solution has better features?
- What are the best on-prem ETL tools?
- Which integration solution is best for a company that wants to integrate systems between sales, marketing, and project development operations systems?
- Experiences with Oracle GoldenGate vs. Oracle Data Integrator?
- Should we choose Data Hub or GoldenGate?