- Pentaho data integration
- Most of the ETL stuff can be done with minimal coding
- Reporting capabilities
Manager at a consultancy with 10,001+ employees
It helped in managing the data from different sources into one unique target. I would like to see what code the report tool generates.
What is most valuable?
How has it helped my organization?
It helped in managing the data from different sources into one unique target.
What needs improvement?
In the reporting tool, I would like to see what code it generates. As of now, there is no provision to see the underlying code of the PRD file.
For how long have I used the solution?
I've used it for one year.
Buyer's Guide
Pentaho Business Analytics
January 2025
Learn what your peers think about Pentaho Business Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: January 2025.
831,265 professionals have used our research since 2012.
What was my experience with deployment of the solution?
There have been no issues with deployment.
What do I think about the stability of the solution?
There have been no stability issues.
What do I think about the scalability of the solution?
There have been no issues scaling it.
How are customer service and support?
Customer Service:
I have not had to use the customer service.
Technical Support:I have not had to use technical support.
Which solution did I use previously and why did I switch?
There was no other solution in place.
How was the initial setup?
It was straightforward and became complex later to our understanding of the existing structure and the use of the ETL to align with those.
What about the implementation team?
We did it in-house. You need to have a good understanding of what the tool can offer like ETL, MDM, ans SCDs.
What's my experience with pricing, setup cost, and licensing?
We're using the free edition.
What other advice do I have?
It's moderate to use, learn, and implement. It's nice and you should use it.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Senior Consultant at a consumer goods company with 1,001-5,000 employees
The Data Integration graphical drag and drop design is easy for new users to follow and can increase productivity.
Valuable Features
Pentaho Business Analytics platform overall is an outstanding product that offers great cost saving solutions for companies of all sizes. The Pentaho Business Analytics platform is built on top of several underlying open source projects driven by the community’s contributions. There are several features that I find invaluable and with each release, improvements are made.
The Pentaho User Console provides a portal for users that makes it easy for users to explore information interactively. Dashboard Reporting, scheduling jobs, and managing data connections are some of the features that are made easy with the console. For more advanced users you can extend Pentaho Analyzer with custom visualizations or create reporting solutions with Ctools. The Marketplace empowers the community to develop new and innovative plugins and simplifies the installation process of the plugins for the users of the console. The plugin framework provides a plugin contributor that extends the core services offered by the BI Server.
Pentaho Data Integration (Spoon) is also another valuable tool for development. Spoon delivers powerful extraction, transformation, and load capabilities using a Metadata approach. The Data Integration graphical drag and drop design is easy for new users to follow and can increase productivity. More advanced users can extend Pentaho Data Integration creating transformations and jobs dynamically.
Improvements to My Organization
My company was able to reduce software costs and hire additional staff given the cost savings that Pentaho provided. We are moving towards a Hadoop environment after the migration of our current ETL processes and Pentaho’s easy to use development tools and big data analytics capabilities were a factor in choosing Pentaho as a solution.
Room for Improvement
For those that run the open source community edition at times it can be difficult to find updated references for support. Even for companies that use the Enterprise Edition finding useful resources when a problem occurs can be difficult. Pentaho driven best practices should be made available to both the Community and Enterprise users to motivate and empower more users to use the solutions effectively.
Customer Service and Technical Support
Pentaho has stellar support services with extremely intelligent Pentaho and Hitachi consultants all over the world. Those support services and documentation are made available to Enterprise clients that have purchased the Enterprise Edition and have access to the support portal.
Initial Setup
Pentaho is easy to deploy, easy to use and maintain. It’s low cost and a fully supported business intelligence solution. I have used Pentaho in small and large organizations with great success.
Pricing, Setup Cost and Licensing
Enterprise licenses can be paid for the Enterprise Pentaho full service solution which offers support through the portal and access to Pentaho/Hitachi Consultants for additional costs.
Other Advice
Pentaho offers a community edition which is an open source solution and can be downloaded for free. The community edition truly gives most companies everything they need but your solution needs are matched with your business needs. As a cost cutting option Enterprise license fees can be paid to vendors to fund in demand support.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Pentaho Business Analytics
January 2025
Learn what your peers think about Pentaho Business Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: January 2025.
831,265 professionals have used our research since 2012.
Owner with 51-200 employees
Pentaho BI Suite Review: Final Thoughts – Part 6 of 6
Introduction
This is the last of a six-part review of the Pentaho BI suite. In each part of the review, we will take a look at the components that make up the BI suite, according to how they would be used in the real world.
Data Mining
In this sixth part, originally, I'd like to at least touch on the only part of Pentaho BI Suite we have not talked about before: Data Mining. However as I gather my materials, I realized that Data Mining (along with its ilks: Machine Learning, Predictive Analysis, etc.) is too big of a topic to fit in the space that we have here. Even if I try, the usefulness would be limited at best since at the moment, while the result is being used to solve real-world problems, the usage of Data Mining tools is still exclusively within the realm of data scientists.
In addition, as of late I use Python more for working with datasets that requires a lot of munging, preparing, and cleaning. So as an extension to that, I ended using Pandas, SciKit Learning, and other Python-specific Data Mining libraries instead of Weka (which is basically what the Pentaho Data Mining tool is).
So for those who are new to Data Mining with Pentaho, here is a good place to start, an interview with Mark Hall who was one of the author of Weka who now works for Pentaho: https://www.floss4science.com/machine-learning-with-weka-mark-hall
The link above also has some links to where to find more information.
For those who are experienced data scientists, you probably already made up your mind on which tool suits your needs best and just like I went with Python libraries, you may or may not prefer the GUI approach like Weka.
New Release: Pentaho 5.0 CE
For the rest of this review, we will go over the new changes that comes with the highly anticipated release of the 5.0 CE version. Overall, there are a lot of improvements in various parts of the suite such as PDI and PRD, but we will focus on the BI Server itself, where the largest impact of the new release can be seen.
A New Repository System
In this new release, one of the biggest shock for existing users is the switch from file-based repository system to the new JCR-based one. JCR is a database-backed content repository system that was implemented by the Apache Foundation and code-named “Jackrabbit.”
The Good:
- Better metadata management
- No longer need to refresh the repository manually after publishing solutions
- A much better UI for dealing with the solutions
- API to access the solutions via the repository which opens up a lot of opportunities for custom applications
The Bad:
- It's not as familiar or convenient as the old file-based system
- Need to use a synchronizer plugin to version-control the solutions'
It remains to be seen if this switch will pay off for both the developers and the users in the long run. But it is stable and working for the most part, so I can't complain.
The Marketplace
One of the best feature of the Pentaho BI Server is its plugin-friendly architecture. In version 5.0 this architecture has been given a new face called the Marketplace:
This new interface serves two important functions:
- It allows admins to install and update plugins (almost all Pentaho CE tools are written as plugins) effortlessly
- It allows developers to publish their own plugins to the world
There are already several new plugins that is available with this new release, notably Pivot4J Analytics. An alternative to Saiku that shows a lot of promises to become a very useful tool to work with OLAP data. Another one that excites me is Sparkl with which you can create other custom plugins.
The Administration Console
The new version also brings about a new Administration Console where we manage Users and Roles:
No longer do we have to fire-off another server just to do this basic administrator task. In addition, you can manage the Mail server (no more wrangling configuration files).
The New Dashboard Editor
As we discussed in Part V of this review, the CDE is a very powerful dashboard editor. In version 5.0, the list of available Components are further lengthen by new ones. And the overall editor seems to be more responsive in this new release.
Usage experience: The improvements in the Dashboard editor is helping me to create dashboards for my clients that goes beyond the static ones. In fact, the one below (demo purposes only) has the interactivity level that rivals a web application or an electronic form:
NOTE: Nikon and Olympus are trademarks of Nikon Corporation and Olympus Group respectively.
Parting Thoughts
Even though the final product of a Data Warehouse of a BI system is a set of answers and forecasts, or dashboards and reports, it is easy to forget that without the tools that help us to consolidate, clean up, aggregate, and analyze the data, we will never get to the results we are aiming for.
As you can probably tell, I serve my clients with various tools that makes sense given their situation, but time and again, the Pentaho BI Suite (CE version especially) has risen to fulfill the needs. I have created Data Warehouses from scratch using Pentaho BI CE, pulling in data from various sources using the PDI, created OLAP cubes with the PSW, which ends up as the data source for the various dashboards (financial dashboards, inventory dashboards, marketing dashboards, etc.) and published reports created using the PRD.
Of course my familiarity with the tool helps, but I am also familiar with a lot of other BI tools beside Pentaho. And sometimes I do have to use other tools in preference to Pentaho because they suit the needs better.
But as I always mention to my clients, unless you have a good relationship with the vendor to avoid paying hundreds-of-thousands per year just to be able to use tools like IBM Cognos, Oracle BI, or SAP Business Objects, there is a good chance that the Pentaho (either EE or CE version) can do the same for less, even zero license cost in the case of CE.
Given the increased awareness on the value of data analysis in today's companies, these BI tools will continue to become more and more sophisticated and powerful. It is up to us business owners, consultants, and data analysis everywhere to develop the skills to harness the tool and crank out useful, accurate, and yes, easy-on-the-eyes decision-support systems. And I suspect that we will always see Pentaho as one of the viable options. A testament to the quality of the team working on it. The CE team in particular, it would be amiss not to acknowledge their efforts to improve and maintain a tool this complex using the Open Source paradigm.
So here we are, at the end of the sixth part. Writing this six-part review has been a blast. And I would like to give a shout out to the IT Central Station who has graciously hosted this review for all to benefit from. Thanks for reading.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Owner with 51-200 employees
Pentaho BI Suite Review: PDI – Part 1 of 6
Introduction
The Pentaho BI Suite is one of the more comprehensive BI suite that is also available as an Open Source project (the Community Edition). Interestingly, the absence of license fees is far from being the only factor in choosing this particular tool to build your Data Warehouses (OLAP systems).
This is the first of a six-part review of the BI suite. In each part of the review, we will take a look at the components that make up the BI suite, according to how they would be used in the real world.
In this first part, we'll be discussing the Pentaho Data Integration (from here on will be referred to as PDI) which is the ETL tool that comes with the suite. An ETL tool is the means with which you input data from various sources – typically out of some transactional systems, then transform the format and flow into another data model that is OLAP-friendly. Therefore it acts as the gateway into using the other parts of the BI suite.
In the case of PDI, it has two components:
Spoon (the GUI), where you string together a set of Steps within a Transformation and optionally string multiple Transformations within a single Job. This is where you would spend the bulk of your time developing ETL scripts.
The accompanying set of command-line scripts that we can configure to be launched from a scheduler like cron or Windows Task Scheduler. Notably pan a single Transformation runner, kitchen the Job runner, and carte the slave-server runner. These tools give us the flexibility to create our own network of multi-tiered notification system, should we need to.
Is it Feature-Complete'
ETL tools are interesting because anyone who has implemented a BI system have a standard list of major features expected to be available. This standard list does not change from one tool brand to the other. Let's see how PDI fares:
Serialized vs Parallel ETL processing: PDI handles parallel (async.) steps using Transformations, which can be strung together in a Job when we need a serialized sequences.
Parameter-handling: PDI has a property file that allows us to parameterize things that are specific to different platforms (dev/test/prod) such as database name, credentials, external servers. It also features parameters that can be created during the ETL run out of the data in the stream, then passed on from one Transformation to another within a Job.
Script management: Just like any other IT documents (or as some call it artifacts), ETL scripts need to be managed, version-controlled, and documented. PDI scores high on this front. Not because of some specific features, instead, due to design decisions that favor simplicity: The scripts are plain XML documents. That makes it very easy to manage, version-control, and if necessary batch-edit. NOTE: For those who wants enterprise level script management and version-control built into the tool, Pentaho made it available as part of their Enterprise offerings. But for the rest of us who already have a document management process – because we also develop software using other tools – it is not as crucial.
Clustering: PDI supports round-robin -style load-balancing given a set of slave-servers. For those using Hadoop clusters, Pentaho recently added their support to run Jobs on those.
Is it Easy to Use'
With the drag and drop graphical UI approach, the ease of use is a given. It is quite easy to string together steps to accomplish the ETL process. The trick is knowing which steps to use, and when to use it.
The documentation on how to use each step can stand improvements that fortunately, slowly over the years have started to catch up – and should you have the budget, you can always pay for support that comes with the Enterprise Edition. But overall, it is a matter of using those enough to be familiar with the use cases.
This is why competent BI consultants are worth their weights in gold because they have been in the trenches, and have accumulated ways to deal with the quirks which is bound to be encountered in a software system this complex (not just Pentaho, this applies to any BI Suite products out there).
NOTE: I feel obligated to point out one (very) annoying fact that I cannot hit the Enter key to edit the selected step. Think about how many times we would use this functionality on any ETL tool.
Aside from that, in the few years that I've used various versions of the GUI, I've never encountered severe data loss due to stability problems.
Another measurement of ease-of-use that I evaluate a tool with is: How easy it is to debug the ETL scripts. With PDI, the logical structures of the scripts could be easily followed, therefore it's quite debug-friendly.
Is it Extensible'
It may be a strange question at first, but let us think about it. One of the purpose of using an ETL tool is to deal with a variety of data sources. No matter how comprehensive the included data format readers/writers, sooner or later you would have to talk to a proprietary system that is not widely-known. We had to do this once for one of our clients. We ended up writing a custom PDI step that communicates with the XML-RPC backend of an ERP system.
The good news is, with PDI, anyone with some Java SDK development experience, can readily implement the published interfaces and thus creating their own custom Transformation steps. In this regard, I am quite impressed with the modular design, that allows users to extend the functionality and consequently, the usefulness of the tool.
The scripting ability built into the Steps is also one of the ways to handle proprietary – or extremely complex data. PDI allows us to write Javascript (and Java, should you want faster performance) programs to manipulate the data both at the row level as well as pre- and post- run, which comes very handy to handle variable initializations or sending notifications that contain statistical info about all of the rows.
Summary
The PDI, is one of the jewels in the Pentaho BI Suite. Aside from some minor inconveniences within the GUI tool, the simplicity, extensibility, and stability of the whole package makes PDI a good tool for building a network of ETLs marshaling data from one end of the systems to another. In some cases, it even serves well as a development tool for the batch-processing side of an OLTP system.
Next in part-two, we will discuss the Pentaho BI Server.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
IT Manager at a transportation company with 51-200 employees
In terms of functionality, they're not growing as fast as other companies. It's good for showing the need for BI.
What is most valuable?
Pentaho Data Integration (PDI).
Pentaho Analysis Services
Pentaho Reporting
How has it helped my organization?
We developed Sales’s and HR's datamarts. So nowadays, managers of these departments can have quick and flexible response with them. I think it was an improvement, because in the past each new analyses demanded IT resources, taking time, and this doesn't occur nowadays. The final users have much more freedom to discover the information they need.
What needs improvement?
I think that Pentaho can improve a lot its UI interface and its tool for dashboard maintenance.
For how long have I used the solution?
2 years
What was my experience with deployment of the solution?
I think the most complex are the solutions with the most hardcore implementations. Pentaho could invest more to make the life of developers’ easier.
What do I think about the stability of the solution?
Yes, once in a while, we have to face a unexpected problem that takes us time to overcome. And it causes problems with user’s satisfaction.
What do I think about the scalability of the solution?
No. I think the choice for Pentaho was right for my company. It fits very well for our purpose, which was demonstrate to the directors the power of BI for the business. But, now there is a perception of the benefits, and the company is become bigger. Perhaps, in the near future, I can evaluate other options, even Pentaho EE.
How are customer service and technical support?
Customer Service:
My company has a procedure to evaluate all of our suppliers and we have questions about promptness, level of expertise, pre-sale and post-sale, effectiveness and efficiency.
Technical Support:7 out of 10
Which solution did I use previously and why did I switch?
Yes, when I started with Pentaho in 20111 I already had worked in another company that had Cognos BI Suite as a BI solution.
How was the initial setup?
The initial setup was straightforward. The setup was done by my team, which had no expertise with the Pentaho BI Suite. In 2 days, I was presented with the first dashboards.
What about the implementation team?
I implemented my first Pentaho project with a vendor team, which help us a lot, but its level of expertise could be better. In the middle of the project, we had some delays related to doubts which had to be clarified by Pentaho’s professionals.
What was our ROI?
The ROI of this product is good, because in little time you can have the first’s outputs. But it’s not excellent if compared with other BI solutions, like QlikView or Tableau.
What's my experience with pricing, setup cost, and licensing?
My original setup cost for the first project was $30,000 and the final cost was about $35,000.
Which other solutions did I evaluate?
Yes. Cognos, Microstrategy and Jaspersoft.
What other advice do I have?
For me, Pentaho is not growing in terms of functionality, as fast as other companies in the same segment. The UI falls short and for more complex solutions, it’s necessary to have good developers. However, being an Open Source solution, I think it allows IT departments to show with low investment the importance of BI for the company.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Engineer at a marketing services firm with 51-200 employees
It does a lot of what we need but off-the-shelf solutions often can’t do exactly what you need
Being in the business of online-to-offline ad attribution and advertising analytics, we need tools to help us analyze billions of records to discover interesting insights for our clients. One of the tools we use is Pentaho, an open source business intelligence platform that allows us to manage, transform, and explore our data. It offers some nice GUI tools, can be quickly set up on top of existing data, and has the advantage of being on our home team.
But for all the benefits of Pentaho, making it work for us has required tweaking and in some cases replacing Pentaho with other solutions. Don’t take this the wrong way: we like Pentaho, and it does a lot of what we need. But at the edges, any off-the-shelf solution often can’t do exactly what you need.
Perhaps the biggest problem we faced was getting queries against our cubes to run quickly. Because Pentaho is built around Mondrian, and Mondrian is a ROLAP, every query against our cubes requires building dozens of queries that join tables with billions of rows. In some cases this meant that Mondrian queries could require hours to run. Our fix has been to make extensive use of summary tables, i.e. summarizing counts of raw data at levels we know our cubes will need to execute queries. This has allowed us to take queries that ran in hours to run in seconds by doing the summarization for all queries once in advance. At worst our Mondrian queries can take a couple minutes to complete if we ask for really complicated things.
Early on, we tried to extend our internal use of Pentaho to our clients by using Action Sequences, also known as xactions after the Action Sequence file extension. Our primary use of xactions was to create simple interfaces for getting the results of Mondrian queries that could then be displayed to clients in our Rails web application. But in addition to sometimes slow Mondrian queries (in the world of client-facing solutions, even 15 seconds is extremely slow), xactions introduce considerable latency as they start up and execute, adding as much as 5 seconds on top of the time it takes to execute the query.
Ultimately we couldn’t make xactions fast enough to deliver data to the client interface, so we instead took the approach we use today. We first discover what is useful in Pentaho internally, then build solutions that query directly against our RDBMS to quickly deliver results to clients. Although, to be fair to Mondiran, some of these solutions require us to summarize data in advance of user requests to get the speed we want because that data is just that big and the queries are just that complex.
We’ve also made extensive use of Pentaho Data Integration, also known as Kettle. One of the nice features about Kettle is Spoon, a GUI editor for writing Kettle jobs and transforms. Spoon made it easy for us to set up ETL processes in Kettle and take advantage of Kettle’s ability to easily spread load across processing resources. The tradeoff, as we soon learned, was that Spoon makes the XML descriptions of Kettle jobs and transforms difficult to work on concurrently, a major problem for us since we use distributed version control. Additionally, Kettle files don’t have a really good, general way of reusing code short of writing custom Kettle steps in Java, so it makes maintaining our large collection of Kettle jobs and transforms difficult. On the whole, Kettle was great for getting things up and running quickly, but over time we find its rapid development advantages are outweighed by the advantages of using a general programming language for our ETL. The result is that we are slowly transitioning to writing ETL in Ruby, but only transitioning 0n an as-needed basis since our existing Kettle code works well.
As we move forward, we may find additional places where Pentaho does not fully meet our needs and we must find other solutions to our unique problems. But on the whole, Pentaho has proven to be a great starting platform for getting our analytics up and running and has allowed us to iteratively build out our technologies without needing to develop custom solutions from scratch for everything we do. And, I expect, Pentaho will long have a place at our company as an internal tool for initial development of services we will offer to our clients.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Administrative Assistant at a university with 10,001+ employees
Makes it easy to develop data flows and has a wide range of database connections
Pros and Cons
- "Pentaho Business Analytics' best features include the ease of developing data flows and the wide range of options to connect to databases, including those on the cloud."
- "Pentaho Business Analytics' user interface is outdated."
What is our primary use case?
I primarily use Pentaho Business Analytics to create ETL processes, monitoring processes, and hierarchies.
What is most valuable?
Pentaho Business Analytics' best features include the ease of developing data flows and the wide range of options to connect to databases, including those on the cloud.
What needs improvement?
Pentaho Business Analytics' user interface is outdated. It's also limited in out-of-the-box features, which forces you to develop features yourself. There are also some problems with having to update metadata manually, which I would like to see Pentaho fix in the future.
What do I think about the stability of the solution?
Pentaho Business Analytics is stable.
What do I think about the scalability of the solution?
Pentaho Business Analytics is scalable (though I have only tested this lightly).
How are customer service and support?
Since Pentaho Business Analytics is open-source, it has a very helpful community.
Which solution did I use previously and why did I switch?
I previously used Microsoft Integration Services and Microsoft Azure Data Factory.
How was the initial setup?
The initial setup was easy.
What other advice do I have?
Pentaho Business Analytics is a very good product for those starting to work with ETL processes. Usually, it will solve every problem you may have when creating those processes, and it's free, with a big support community. However, it may not be the best choice if your company has a very strong relationship with Microsoft or if you want to work in the cloud. I would give Pentaho Business Analytics a rating of eight out of ten.
Which deployment model are you using for this solution?
On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
COO at a tech services company with 11-50 employees
Fast Development (Agile BI), Good Charts and Visualization, Good Security, Good User Interface
What is most valuable?
Pentaho Analyzer (EE)
Saiku (CE)
Marketplace (CE)
R (EE and CE)
Community Dashboard Framework (CE)
Dashboard Editor (EE)
How has it helped my organization?
Powerful Analytics, Fast KPI Analysis
For how long have I used the solution?
4 Years
What was my experience with deployment of the solution?
Integration with GeoServer (Specially ShapeFiles Layers on Maps)
What do I think about the stability of the solution?
None
What do I think about the scalability of the solution?
Migrate old version of Reports (.prpt) to a new version
How are customer service and technical support?
Customer Service:
5/10
Technical Support:9/10
Which solution did I use previously and why did I switch?
Yes, QlikView.
How was the initial setup?
Difficulty: medium
What was our ROI?
45%
Which other solutions did I evaluate?
QlikView
Tableau, SpagoBI
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Download our free Pentaho Business Analytics Report and get advice and tips from experienced pros
sharing their opinions.
Updated: January 2025
Popular Comparisons
Microsoft Power BI
IBM Cognos
SAP BusinessObjects Business Intelligence Platform
Oracle OBIEE
MicroStrategy
Oracle Analytics Cloud
Buyer's Guide
Download our free Pentaho Business Analytics Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:
- Seeking advise on going with Birst, BOARD or Pentaho as an OEM platform solution where we could end up with 1,000's of users over time.
- Jaspersoft vs. Pentaho. Which should we choose?
- Performance benchmarks for Pentaho?
- What is the biggest difference between SSIS and Pentaho?
- When evaluating Business Intelligence Tools, what aspect do you think is the most important to look for?
- BI Tool Replacements, What Do You Recommend?
- Which one is best for ETL - Pentaho or Jaspersoft?
- Seeking advise on going with Birst, BOARD or Pentaho as an OEM platform solution where we could end up with 1,000's of users over time.
- BI Tool Evaluation Criteria Rating Matrix -- anyone have one they've used in making a tool selection?
- QlikView or Tableau - Which is better?
Have you looked into using Talend?? It's got a great user interface, very similar to kettle, and their paid for version has version control that works very well, and you get the ability to run "joblets" which are basically re-usable pieces of code. Even in the free version there is version control, although it's pretty clumsy, and not joblets in the free, and the free version is difficult to get working with Github.