Try our new research platform with insights from 80,000+ expert users
PeerSpot user
Pentaho Specialist/Free Software Expert at a tech services company with 10,001+ employees
Consultant
One has only to enable the jobs and transformation to take advantage of PDI's clustering abilities.

What is most valuable?

Pentaho is a suite with five main products: Pentaho Data Integration for ETL, Pentaho Business Analytics Server for results delivery and development clients Report Designer, Metadata Editor and Schema Workbench.

Pentaho Data Integration's (PDI, former Kettle) features and resources are virtually unbeatable as it can handle everything from the smallest Excel files to the most complex and demanding data loads. It's able to scale from a single desktop computer to lots of nodes, on premises or in the cloud. Not only is it powerful, but it is also easy to use. I have never worked with anything else, like Informatica's PowerCenter or Microsoft's SSIS but I have always taken the opportunity to inquire who has. Lastly, PDI is easier to use and achieves more with less effort than those other products.

Then there is the Pentaho BA Server, built to be the linchpin on BI delivery for enterprises. It is built on a scalable, auditable platform able to deliver from dashboards and reports to OLAP and custom-made features. It supports background processing, results bursting by e-mail, load balacing (through native Java Webserver - like Tomcat - load balancing features), integration with corporate directories services as MS Active Directory and LDAP directories, with account management and lots of bell and whistles.

The suite's plugin architecture deserves a special remark: Both PDI and BA Server are built to be easily extended with plugins. There are two plugins marketplaces, one for PDI and onde for BA Server, with a good supply of diverse features. It all those plugins are not enough, there are means to develop you own plugin either coding in Java (mostly for PDI) or, for the BA Server, with point-and-click ease with Sparkl, a BA Server plugin for easy development and packing of new BA Server plugins (but some need of JavaScript, CSS and HTML is needed.)

Any company is able to design and delivery a deep and embrancing BI strategy with Pentaho. At its relatively low prices, when sided with comparable competition, the most valuable features are the data integration and the results delivery platform.

How has it helped my organization?

I work for the largest government owned IT enterprise in Brazil, employing over 10.000 people with yearly earning in surplus of half billion dollars. Designing and delivering timely BI solutions used to be a bogged down process because everything involved license costs. With Pentaho we were able to better suit our needs and better serve our customers. We use CE were for our departamental BI needs, and deliver solid service to our customers using paid licenses. Also, in being so complete, Pentaho has enabled a whole new level of experimentation and testing. We can completlly evaluate a customer need with CE licenses and then delivery the solution at a price, assembling it over EE licenses. We need paid support for our customers in order to be able to timely answer any outage.

What needs improvement?

Pentaho has a solid foundation and decent user interfaces. They are lacking, however, in the tool space for data exploration/presentation. The recent Data Discovery trend put a lot of strain on visual data analysis tools suppliers and Pentaho has chosen to strengthen their data integration features, aiming for Big Data and Hadoop growing market. The work on visual data exploring tools was then mainly left for the community to tackle on.

So, there is room for improvement regarding graphical interface for data exploration and presentation. Please note that there is no wanting for decent tools, only that the tools are not as sharp and as beautiful as QlikView, for instance. Pentaho delivers, no question, it only does not pleases the eye that much.

For how long have I used the solution?

I have been using the whole Pentaho suite for nine years. I have also self-published a book on Pentaho and regularly write for my BI/Pentaho blog.

Buyer's Guide
Pentaho Business Analytics
December 2024
Learn what your peers think about Pentaho Business Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: December 2024.
824,067 professionals have used our research since 2012.

What was my experience with deployment of the solution?

Being such a young product, experiencing fast evolution and rapid company growth, not every time things are bug free. Every new release cames in with its share of new bugs. Any upgrades were not without concerns, although there were never risk of losing data - Pentaho is simple to an extreme and hardly we find some nasty dependency hurting our deliveries.

The main deploy problems were with LDAP and Apache integration. There is a need for quite some knowledge on web servers architecture to allow a team a smooth delivery experience.

What do I think about the stability of the solution?

We did encounter stability issues. Being a data intensive application, Pentaho is quite sensitive to RAM limitations. Whenever not enough RAM is allocated for it to work, it would progressively slow down to a crawl and then to a halt. Lots of well managed disk cache and server clustering aleviates it though.

What do I think about the scalability of the solution?

Pentaho scales really very well.

Pentaho Data Integration scalation is a breeze: just setup the machines, configure the slaves and master and that is it. One has only to enable the jobs and transformation to take advantage of PDI's clustering abilities, and that might be tricky but easy nonetheless. Bottom line of data integration scalability is limited to developers ingenuity on data processing compartmentalization so processing parallelization and remote processing become profitable for clustering.

Pentaho BA Server also scales well, on a quite standard load balancing scheme. Being a regular and well behaved Java program, the Pentaho BA Server is enabled to be clustered on the Java web server, like JBoss, or in a Apache/Tomcats multi-server loading balancing schema.

It is not for the amateur Pentaho administration to do it, however. In fact, a Pentaho administrator alone probably will have a degree of difficulty to achieve server scaling, and would be better of having help from web server clustering professionals.

How are customer service and support?

Customer Service:

My company has been served only be the Brazilian Pentaho's representative, which are knockout good guys and gals, which deliver it at any cost! They have even brought in Pentaho technicians from USA to assess some of our issues. Only kudos to them. I cannot opine on US or Europe support, but I have no reason to think less of them.

Technical Support:

Technical support is a mixed issue with Pentaho. As previously stated, it is a young product, from a young company. The technical support by the means of instructions manuals, fora, Wikis and the like is quite good. However, the fast growing has left some breaches along the documentation body.

For instance, I needed to find how to enable certain feature on reporting designing. I was not able to find it in the official help guides, but there was the project leader blog where I found a post talking about it. With the correct terming I was able to look for it in the International Forum, where lying there was the answer I was in need of. So, overall it is good, but it is still in the road for a complete and centralized, well managed, gapless documentation body.

Which solution did I use previously and why did I switch?

In fact we are still using the whole lot: MicroStrategy, Business Objects, and PowerCenter. We have not turned off all those implementations, only Pentaho clang all around us like weed - it is so easy to start using and gives results with so little effort it is almost impossible to use something else. Most of the time, we offer other options only at the customers requesting. Otherwise, left to us, we are most likely to propose using Pentaho.

How was the initial setup?

Hard answer: both. We got up to delivering results in almost no time. However, a sizeable lot of little vicious details kept resisting to us - most issues with stability, latter associated with RAM limitations, and user management, tied to LDAP integration. Part of the said difficulties stemed from bugs, too, so there were only a matter of time waiting for Pentaho to fix them,

After that the customer kicked in a lot of small changes and adaptations, truly to the "since-we-are-at-it"-scope-creep-spirit (some rightful, some pure fancy), which had us and Pentaho scratching our mutual heads. In the end we kinda helped them advance some updates in the Server. And delivered all that was asked.

What about the implementation team?

We started with our in house team and when things started to get too much weird or complicated the vendor team landed in. After that first fire baptism we got a couple of hard boiled ninjas that were able to firefight anything and the vendor team was sent back home, with praises.

What was our ROI?

No ROI for us. The company I work for has no business approach to BI strategy. All we, as a company care, is to make the customer happy and that has the cost of not letting us turn down some unprofiting projects. So, Pentaho is a good tool and capable of delivering millions of dollars on new/recouped/saved revenue, but we are not posing for that.

Thinking a bit more, the mere fact we are able to deliver more, and hence take more orders, might be seem as a return on our investment. Yet I can't exact a number, for even this kind of return is a little unclear.

What's my experience with pricing, setup cost, and licensing?

Pentaho is cheap, and becomes cheaper as your team master it. However, it would be a total waste of good dollars to believe my word. Try it for free and go look for professional support from Pentaho. You can also try to compare other tools with Pentaho, but keep in mind that, appart from SAS, all other tools compete on a part of Pentaho. So you must assembly a set of different products to fully compare to it.

Let us say you are going to build a standard dimensional data mart to serve OLAP up. Pentaho has a single price tag, which must be matched to a MicroStrategy PLUS Informatica PowerCenter to make for a correct comparison.

The Community Edition, a free version, is not short on features when compared to the Enterprise Edition, it is just a bit uglier.

To match a Pentaho license price with only either one will give wrong results.

Which other solutions did I evaluate?

Pentaho was a total unknown product back in 2006-2007. We ran several feature comparison sheets. The biggest and most controversial were against Informatica's PowerCenter and MicroStrategy Intelligent Server. Both were matched with Pentaho at some degree, and few things Pentaho was not able to deliver then. But, and this is a rather strong but, most of the time Pentaho had to be tweaked with to deliver that itens. It was a match, allright, but not a finished product by then.

Since that time the suite has evolved a lot and became more head to head comparable with the same products.

What other advice do I have?

Pentaho has a huge potential to deliver quite a lot of BI value. But on those days when BI is regarded as a simple multidimensional analytics tools, it seems a bit bloated and off the mark. It is so because Pentaho is not aimed to be flashy and eye-pleasing for a commomplace reporting monger (reporting is the farthest you can get from BI and still smell like it), and it requires a bit of strategy to allow for ROI. If you are looking for tools for immediate, prompt, beautifull remmedy, Pentaho might not be your pick. But if you know what you want to acomplish, go on and try it.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
PeerSpot user
Senior Consultant at a consumer goods company with 1,001-5,000 employees
Consultant
The Data Integration graphical drag and drop design is easy for new users to follow and can increase productivity.

Valuable Features

Pentaho Business Analytics platform overall is an outstanding product that offers great cost saving solutions for companies of all sizes. The Pentaho Business Analytics platform is built on top of several underlying open source projects driven by the community’s contributions. There are several features that I find invaluable and with each release, improvements are made.

The Pentaho User Console provides a portal for users that makes it easy for users to explore information interactively. Dashboard Reporting, scheduling jobs, and managing data connections are some of the features that are made easy with the console. For more advanced users you can extend Pentaho Analyzer with custom visualizations or create reporting solutions with Ctools. The Marketplace empowers the community to develop new and innovative plugins and simplifies the installation process of the plugins for the users of the console. The plugin framework provides a plugin contributor that extends the core services offered by the BI Server.

Pentaho Data Integration (Spoon) is also another valuable tool for development. Spoon delivers powerful extraction, transformation, and load capabilities using a Metadata approach. The Data Integration graphical drag and drop design is easy for new users to follow and can increase productivity. More advanced users can extend Pentaho Data Integration creating transformations and jobs dynamically.

Improvements to My Organization

My company was able to reduce software costs and hire additional staff given the cost savings that Pentaho provided. We are moving towards a Hadoop environment after the migration of our current ETL processes and Pentaho’s easy to use development tools and big data analytics capabilities were a factor in choosing Pentaho as a solution.

Room for Improvement

For those that run the open source community edition at times it can be difficult to find updated references for support. Even for companies that use the Enterprise Edition finding useful resources when a problem occurs can be difficult. Pentaho driven best practices should be made available to both the Community and Enterprise users to motivate and empower more users to use the solutions effectively.

Customer Service and Technical Support

Pentaho has stellar support services with extremely intelligent Pentaho and Hitachi consultants all over the world. Those support services and documentation are made available to Enterprise clients that have purchased the Enterprise Edition and have access to the support portal.

Initial Setup

Pentaho is easy to deploy, easy to use and maintain. It’s low cost and a fully supported business intelligence solution. I have used Pentaho in small and large organizations with great success.

Pricing, Setup Cost and Licensing

Enterprise licenses can be paid for the Enterprise Pentaho full service solution which offers support through the portal and access to Pentaho/Hitachi Consultants for additional costs.

Other Advice

Pentaho offers a community edition which is an open source solution and can be downloaded for free. The community edition truly gives most companies everything they need but your solution needs are matched with your business needs. As a cost cutting option Enterprise license fees can be paid to vendors to fund in demand support.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Pentaho Business Analytics
December 2024
Learn what your peers think about Pentaho Business Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: December 2024.
824,067 professionals have used our research since 2012.
reviewer2031747 - PeerSpot reviewer
Administrative Assistant at a university with 10,001+ employees
Real User
Leaderboard
Makes it easy to develop data flows and has a wide range of database connections
Pros and Cons
  • "Pentaho Business Analytics' best features include the ease of developing data flows and the wide range of options to connect to databases, including those on the cloud."
  • "Pentaho Business Analytics' user interface is outdated."

What is our primary use case?

I primarily use Pentaho Business Analytics to create ETL processes, monitoring processes, and hierarchies.

What is most valuable?

Pentaho Business Analytics' best features include the ease of developing data flows and the wide range of options to connect to databases, including those on the cloud.

What needs improvement?

Pentaho Business Analytics' user interface is outdated. It's also limited in out-of-the-box features, which forces you to develop features yourself. There are also some problems with having to update metadata manually, which I would like to see Pentaho fix in the future. 

What do I think about the stability of the solution?

Pentaho Business Analytics is stable.

What do I think about the scalability of the solution?

Pentaho Business Analytics is scalable (though I have only tested this lightly).

How are customer service and support?

Since Pentaho Business Analytics is open-source, it has a very helpful community.

Which solution did I use previously and why did I switch?

I previously used Microsoft Integration Services and Microsoft Azure Data Factory.

How was the initial setup?

The initial setup was easy.

What other advice do I have?

Pentaho Business Analytics is a very good product for those starting to work with ETL processes. Usually, it will solve every problem you may have when creating those processes, and it's free, with a big support community. However, it may not be the best choice if your company has a very strong relationship with Microsoft or if you want to work in the cloud. I would give Pentaho Business Analytics a rating of eight out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
PeerSpot user
IT Manager at a transportation company with 51-200 employees
Vendor
In terms of functionality, they're not growing as fast as other companies. It's good for showing the need for BI.

What is most valuable?

Pentaho Data Integration (PDI).

Pentaho Analysis Services

Pentaho Reporting

How has it helped my organization?

We developed Sales’s and HR's datamarts. So nowadays, managers of these departments can have quick and flexible response with them. I think it was an improvement, because in the past each new analyses demanded IT resources, taking time, and this doesn't occur nowadays. The final users have much more freedom to discover the information they need.

What needs improvement?

I think that Pentaho can improve a lot its UI interface and its tool for dashboard maintenance.

For how long have I used the solution?

2 years

What was my experience with deployment of the solution?

I think the most complex are the solutions with the most hardcore implementations. Pentaho could invest more to make the life of developers’ easier.

What do I think about the stability of the solution?

Yes, once in a while, we have to face a unexpected problem that takes us time to overcome. And it causes problems with user’s satisfaction.

What do I think about the scalability of the solution?

No. I think the choice for Pentaho was right for my company. It fits very well for our purpose, which was demonstrate to the directors the power of BI for the business. But, now there is a perception of the benefits, and the company is become bigger. Perhaps, in the near future, I can evaluate other options, even Pentaho EE.

How are customer service and technical support?

Customer Service:

My company has a procedure to evaluate all of our suppliers and we have questions about promptness, level of expertise, pre-sale and post-sale, effectiveness and efficiency.

Technical Support:

7 out of 10

Which solution did I use previously and why did I switch?

Yes, when I started with Pentaho in 20111 I already had worked in another company that had Cognos BI Suite as a BI solution.

How was the initial setup?

The initial setup was straightforward. The setup was done by my team, which had no expertise with the Pentaho BI Suite. In 2 days, I was presented with the first dashboards.

What about the implementation team?

I implemented my first Pentaho project with a vendor team, which help us a lot, but its level of expertise could be better. In the middle of the project, we had some delays related to doubts which had to be clarified by Pentaho’s professionals.

What was our ROI?

The ROI of this product is good, because in little time you can have the first’s outputs. But it’s not excellent if compared with other BI solutions, like QlikView or Tableau.

What's my experience with pricing, setup cost, and licensing?

My original setup cost for the first project was $30,000 and the final cost was about $35,000.

Which other solutions did I evaluate?

Yes. Cognos, Microstrategy and Jaspersoft.

What other advice do I have?

For me, Pentaho is not growing in terms of functionality, as fast as other companies in the same segment. The UI falls short and for more complex solutions, it’s necessary to have good developers. However, being an Open Source solution, I think it allows IT departments to show with low investment the importance of BI for the company.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
PeerSpot user
Owner with 51-200 employees
Vendor
Pentaho BI Suite Review: Final Thoughts – Part 6 of 6

Introduction

This is the last of a six-part review of the Pentaho BI suite. In each part of the review, we will take a look at the components that make up the BI suite, according to how they would be used in the real world.

Data Mining

In this sixth part, originally, I'd like to at least touch on the only part of Pentaho BI Suite we have not talked about before: Data Mining. However as I gather my materials, I realized that Data Mining (along with its ilks: Machine Learning, Predictive Analysis, etc.) is too big of a topic to fit in the space that we have here. Even if I try, the usefulness would be limited at best since at the moment, while the result is being used to solve real-world problems, the usage of Data Mining tools is still exclusively within the realm of data scientists.

In addition, as of late I use Python more for working with datasets that requires a lot of munging, preparing, and cleaning. So as an extension to that, I ended using Pandas, SciKit Learning, and other Python-specific Data Mining libraries instead of Weka (which is basically what the Pentaho Data Mining tool is).

So for those who are new to Data Mining with Pentaho, here is a good place to start, an interview with Mark Hall who was one of the author of Weka who now works for Pentaho: https://www.floss4science.com/machine-learning-with-weka-mark-hall

The link above also has some links to where to find more information.

For those who are experienced data scientists, you probably already made up your mind on which tool suits your needs best and just like I went with Python libraries, you may or may not prefer the GUI approach like Weka.

New Release: Pentaho 5.0 CE

For the rest of this review, we will go over the new changes that comes with the highly anticipated release of the 5.0 CE version. Overall, there are a lot of improvements in various parts of the suite such as PDI and PRD, but we will focus on the BI Server itself, where the largest impact of the new release can be seen.

A New Repository System

In this new release, one of the biggest shock for existing users is the switch from file-based repository system to the new JCR-based one. JCR is a database-backed content repository system that was implemented by the Apache Foundation and code-named “Jackrabbit.”

The Good:

  • Better metadata management
  • No longer need to refresh the repository manually after publishing solutions
  • A much better UI for dealing with the solutions
  • API to access the solutions via the repository which opens up a lot of opportunities for custom applications

The Bad:

  • It's not as familiar or convenient as the old file-based system
  • Need to use a synchronizer plugin to version-control the solutions'

It remains to be seen if this switch will pay off for both the developers and the users in the long run. But it is stable and working for the most part, so I can't complain.

The Marketplace

One of the best feature of the Pentaho BI Server is its plugin-friendly architecture. In version 5.0 this architecture has been given a new face called the Marketplace:

This new interface serves two important functions:

  1. It allows admins to install and update plugins (almost all Pentaho CE tools are written as plugins) effortlessly
  2. It allows developers to publish their own plugins to the world

There are already several new plugins that is available with this new release, notably Pivot4J Analytics. An alternative to Saiku that shows a lot of promises to become a very useful tool to work with OLAP data. Another one that excites me is Sparkl with which you can create other custom plugins.

The Administration Console

The new version also brings about a new Administration Console where we manage Users and Roles:

No longer do we have to fire-off another server just to do this basic administrator task. In addition, you can manage the Mail server (no more wrangling configuration files).

The New Dashboard Editor

As we discussed in Part V of this review, the CDE is a very powerful dashboard editor. In version 5.0, the list of available Components are further lengthen by new ones. And the overall editor seems to be more responsive in this new release.

Usage experience: The improvements in the Dashboard editor is helping me to create dashboards for my clients that goes beyond the static ones. In fact, the one below (demo purposes only) has the interactivity level that rivals a web application or an electronic form:

NOTE: Nikon and Olympus are trademarks of Nikon Corporation and Olympus Group respectively.

Parting Thoughts

Even though the final product of a Data Warehouse of a BI system is a set of answers and forecasts, or dashboards and reports, it is easy to forget that without the tools that help us to consolidate, clean up, aggregate, and analyze the data, we will never get to the results we are aiming for.

As you can probably tell, I serve my clients with various tools that makes sense given their situation, but time and again, the Pentaho BI Suite (CE version especially) has risen to fulfill the needs. I have created Data Warehouses from scratch using Pentaho BI CE, pulling in data from various sources using the PDI, created OLAP cubes with the PSW, which ends up as the data source for the various dashboards (financial dashboards, inventory dashboards, marketing dashboards, etc.) and published reports created using the PRD.

Of course my familiarity with the tool helps, but I am also familiar with a lot of other BI tools beside Pentaho. And sometimes I do have to use other tools in preference to Pentaho because they suit the needs better.

But as I always mention to my clients, unless you have a good relationship with the vendor to avoid paying hundreds-of-thousands per year just to be able to use tools like IBM Cognos, Oracle BI, or SAP Business Objects, there is a good chance that the Pentaho (either EE or CE version) can do the same for less, even zero license cost in the case of CE.

Given the increased awareness on the value of data analysis in today's companies, these BI tools will continue to become more and more sophisticated and powerful. It is up to us business owners, consultants, and data analysis everywhere to develop the skills to harness the tool and crank out useful, accurate, and yes, easy-on-the-eyes decision-support systems. And I suspect that we will always see Pentaho as one of the viable options. A testament to the quality of the team working on it. The CE team in particular, it would be amiss not to acknowledge their efforts to improve and maintain a tool this complex using the Open Source paradigm.

So here we are, at the end of the sixth part. Writing this six-part review has been a blast. And I would like to give a shout out to the IT Central Station who has graciously hosted this review for all to benefit from. Thanks for reading.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
PeerSpot user
Owner with 51-200 employees
Vendor
Pentaho BI Suite Review: PDI – Part 1 of 6

Introduction

The Pentaho BI Suite is one of the more comprehensive BI suite that is also available as an Open Source project (the Community Edition). Interestingly, the absence of license fees is far from being the only factor in choosing this particular tool to build your Data Warehouses (OLAP systems).

This is the first of a six-part review of the BI suite. In each part of the review, we will take a look at the components that make up the BI suite, according to how they would be used in the real world.

In this first part, we'll be discussing the Pentaho Data Integration (from here on will be referred to as PDI) which is the ETL tool that comes with the suite. An ETL tool is the means with which you input data from various sources – typically out of some transactional systems, then transform the format and flow into another data model that is OLAP-friendly. Therefore it acts as the gateway into using the other parts of the BI suite.

In the case of PDI, it has two components:

  • Spoon (the GUI), where you string together a set of Steps within a Transformation and optionally string multiple Transformations within a single Job. This is where you would spend the bulk of your time developing ETL scripts.

  • The accompanying set of command-line scripts that we can configure to be launched from a scheduler like cron or Windows Task Scheduler. Notably pan a single Transformation runner, kitchen the Job runner, and carte the slave-server runner. These tools give us the flexibility to create our own network of multi-tiered notification system, should we need to.

Is it Feature-Complete'

ETL tools are interesting because anyone who has implemented a BI system have a standard list of major features expected to be available. This standard list does not change from one tool brand to the other. Let's see how PDI fares:

  1. Serialized vs Parallel ETL processing: PDI handles parallel (async.) steps using Transformations, which can be strung together in a Job when we need a serialized sequences.

  2. Parameter-handling: PDI has a property file that allows us to parameterize things that are specific to different platforms (dev/test/prod) such as database name, credentials, external servers. It also features parameters that can be created during the ETL run out of the data in the stream, then passed on from one Transformation to another within a Job.

  3. Script management: Just like any other IT documents (or as some call it artifacts), ETL scripts need to be managed, version-controlled, and documented. PDI scores high on this front. Not because of some specific features, instead, due to design decisions that favor simplicity: The scripts are plain XML documents. That makes it very easy to manage, version-control, and if necessary batch-edit. NOTE: For those who wants enterprise level script management and version-control built into the tool, Pentaho made it available as part of their Enterprise offerings. But for the rest of us who already have a document management process – because we also develop software using other tools – it is not as crucial.

  4. Clustering: PDI supports round-robin -style load-balancing given a set of slave-servers. For those using Hadoop clusters, Pentaho recently added their support to run Jobs on those.

Is it Easy to Use'

With the drag and drop graphical UI approach, the ease of use is a given. It is quite easy to string together steps to accomplish the ETL process. The trick is knowing which steps to use, and when to use it.

The documentation on how to use each step can stand improvements that fortunately, slowly over the years have started to catch up – and should you have the budget, you can always pay for support that comes with the Enterprise Edition. But overall, it is a matter of using those enough to be familiar with the use cases.

This is why competent BI consultants are worth their weights in gold because they have been in the trenches, and have accumulated ways to deal with the quirks which is bound to be encountered in a software system this complex (not just Pentaho, this applies to any BI Suite products out there).



NOTE: I feel obligated to point out one (very) annoying fact that I cannot hit the Enter key to edit the selected step. Think about how many times we would use this functionality on any ETL tool.

Aside from that, in the few years that I've used various versions of the GUI, I've never encountered severe data loss due to stability problems.

Another measurement of ease-of-use that I evaluate a tool with is: How easy it is to debug the ETL scripts. With PDI, the logical structures of the scripts could be easily followed, therefore it's quite debug-friendly.

Is it Extensible'

It may be a strange question at first, but let us think about it. One of the purpose of using an ETL tool is to deal with a variety of data sources. No matter how comprehensive the included data format readers/writers, sooner or later you would have to talk to a proprietary system that is not widely-known. We had to do this once for one of our clients. We ended up writing a custom PDI step that communicates with the XML-RPC backend of an ERP system.

The good news is, with PDI, anyone with some Java SDK development experience, can readily implement the published interfaces and thus creating their own custom Transformation steps. In this regard, I am quite impressed with the modular design, that allows users to extend the functionality and consequently, the usefulness of the tool.

The scripting ability built into the Steps is also one of the ways to handle proprietary – or extremely complex data. PDI allows us to write Javascript (and Java, should you want faster performance) programs to manipulate the data both at the row level as well as pre- and post- run, which comes very handy to handle variable initializations or sending notifications that contain statistical info about all of the rows.

Summary

The PDI, is one of the jewels in the Pentaho BI Suite. Aside from some minor inconveniences within the GUI tool, the simplicity, extensibility, and stability of the whole package makes PDI a good tool for building a network of ETLs marshaling data from one end of the systems to another. In some cases, it even serves well as a development tool for the batch-processing side of an OLTP system.

Next in part-two, we will discuss the Pentaho BI Server.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
it_user76890 - PeerSpot reviewer
Engineer at a marketing services firm with 51-200 employees
Vendor
It does a lot of what we need but off-the-shelf solutions often can’t do exactly what you need

Being in the business of online-to-offline ad attribution and advertising analytics, we need tools to help us analyze billions of records to discover interesting insights for our clients. One of the tools we use is Pentaho, an open source business intelligence platform that allows us to manage, transform, and explore our data. It offers some nice GUI tools, can be quickly set up on top of existing data, and has the advantage of being on our home team.

But for all the benefits of Pentaho, making it work for us has required tweaking and in some cases replacing Pentaho with other solutions. Don’t take this the wrong way: we like Pentaho, and it does a lot of what we need. But at the edges, any off-the-shelf solution often can’t do exactly what you need.

Perhaps the biggest problem we faced was getting queries against our cubes to run quickly. Because Pentaho is built around Mondrian, and Mondrian is a ROLAP, every query against our cubes requires building dozens of queries that join tables with billions of rows. In some cases this meant that Mondrian queries could require hours to run. Our fix has been to make extensive use of summary tables, i.e. summarizing counts of raw data at levels we know our cubes will need to execute queries. This has allowed us to take queries that ran in hours to run in seconds by doing the summarization for all queries once in advance. At worst our Mondrian queries can take a couple minutes to complete if we ask for really complicated things.

Early on, we tried to extend our internal use of Pentaho to our clients by using Action Sequences, also known as xactions after the Action Sequence file extension. Our primary use of xactions was to create simple interfaces for getting the results of Mondrian queries that could then be displayed to clients in our Rails web application. But in addition to sometimes slow Mondrian queries (in the world of client-facing solutions, even 15 seconds is extremely slow), xactions introduce considerable latency as they start up and execute, adding as much as 5 seconds on top of the time it takes to execute the query.

Ultimately we couldn’t make xactions fast enough to deliver data to the client interface, so we instead took the approach we use today. We first discover what is useful in Pentaho internally, then build solutions that query directly against our RDBMS to quickly deliver results to clients. Although, to be fair to Mondiran, some of these solutions require us to summarize data in advance of user requests to get the speed we want because that data is just that big and the queries are just that complex.

We’ve also made extensive use of Pentaho Data Integration, also known as Kettle. One of the nice features about Kettle is Spoon, a GUI editor for writing Kettle jobs and transforms. Spoon made it easy for us to set up ETL processes in Kettle and take advantage of Kettle’s ability to easily spread load across processing resources. The tradeoff, as we soon learned, was that Spoon makes the XML descriptions of Kettle jobs and transforms difficult to work on concurrently, a major problem for us since we use distributed version control. Additionally, Kettle files don’t have a really good, general way of reusing code short of writing custom Kettle steps in Java, so it makes maintaining our large collection of Kettle jobs and transforms difficult. On the whole, Kettle was great for getting things up and running quickly, but over time we find its rapid development advantages are outweighed by the advantages of using a general programming language for our ETL. The result is that we are slowly transitioning to writing ETL in Ruby, but only transitioning 0n an as-needed basis since our existing Kettle code works well.

As we move forward, we may find additional places where Pentaho does not fully meet our needs and we must find other solutions to our unique problems. But on the whole, Pentaho has proven to be a great starting platform for getting our analytics up and running and has allowed us to iteratively build out our technologies without needing to develop custom solutions from scratch for everything we do. And, I expect, Pentaho will long have a place at our company as an internal tool for initial development of services we will offer to our clients.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
it_user108285 - PeerSpot reviewer
it_user108285Works at a financial services firm
Vendor

Have you looked into using Talend?? It's got a great user interface, very similar to kettle, and their paid for version has version control that works very well, and you get the ability to run "joblets" which are basically re-usable pieces of code. Even in the free version there is version control, although it's pretty clumsy, and not joblets in the free, and the free version is difficult to get working with Github.

See all 7 comments
PeerSpot user
Owner with 51-200 employees
Vendor
Pentaho BI Suite Review: Dashboards – Part 5 of 6

Introduction

This is the fifth of a six-part review of the Pentaho BI suite. In each part of the review, we will take a look at the components that make up the BI suite, according to how they would be used in the real world.

In this fifth part, we'll be discussing how to create useful and meaningful dashboards using the tools available to us in the Pentaho BI Suite. As a complete Data Warehouse building tool, Pentaho offers the most important aspect for delivering enterprise-class dashboards, namely Access Control List (ACL). A dashboard-creation tool without this ability to limit dashboards access to a particular group or role within the company is missing a crucial feature, something that we cannot recommend to our clients.

On the Enterprise Edition (EE) version 5.0, dashboard creation has a user-friendly UI that is as simple as drag-and-drop. It looks like this:

Figure 1. The EE version of the Dashboard Designer (CDE in the CE version)

Here the user is guided to choose a type of grid layout that is already prepared by Pentaho. Of course the option to customize the looks and change individual components are available under the hood, but it is clear that this UI is aimed towards end-users looking for quick results. More experienced dashboard designers would feel severely restricted by this.

In the rest of this review, we will go over dashboard creation using the Community Edition (CE) version 4.5. Here we are going to see a more flexible UI which unfortunately also demands familiarity with javacript and chart library customizations to create something more than just basic dashboards.

BI Server Revisited

In the Pentaho BI Suite, dashboards are setup in these two places:

  1. Using special ETLs we prepare the data to be displayed on the dashboards according to the frequency of update that is required by the user. For example, for daily sales figures, the ETL would be scheduled to run every night. Why do we do this? Because the benefits are two-fold: It increase the performance of the dashboards because it is working with pre-calculated data, and it allows us to apply dashboard-level business rules.
  2. The BI Server is where we design, edit, assign access permissions to dashboards. Deep URLs could be obtained for a particular dashboard to be displayed on a separate website, but some care has to be taken to go through the Pentaho user authorization; depending on the web server setup, it could be as simple as passing authorization tokens, or as complex as registering and configuring a custom module.

Next, we will discuss each of these steps in creating a dashboard. As usual, the screenshots below are sanitized and there are no real data being represented. Data from a fictitious microbrewery is used to illustrate and relate the concepts.

Ready, Set, Dash!

The first step is to initiate the creation of a dashboard. This is accomplished by selecting File > New > CDE Dashboard. A little background note, CDE (which stands for Ctools Dashboard Editor) is part of the Community Tools (or Ctools) created by the team who maintains and improve Pentaho CE.

After initiating the creation of a new dashboard, this is what we will see:

Figure 2. The Layout screen where we perform the layout step

The first thing to do is to save the newly created (empty) dashboard into somewhere within the Pentaho solution folder (just like what we did when we save an Analytic or Ad-Hoc Reports). To save the currently worked on dashboard, use the familiar New | Save | Save As | Reload | Settings menu. We will not go into details on each of this self-explanatory menus.

Now look at the top-right section. There are three buttons that will toggle the screen mode, this particular one is in the Layout mode.

In this mode, we take care of the layout of the dashboard. On the left panel, we see the Layout Structure. It is basically a grid that is made out of Row entries, which contains Column(s) which itself may contain another set of Row(s). The big difference between Row and Column is that the Column actually contains the Components such as charts, tables, and many other types. We give a name to a Column to tie it to a content. Because of this, the names of the Columns must be unique within a dashboard.

The panel to the right, is a list of properties that we can set the values of, mostly HTML and CSS attributes that tells the browser how to render the layout. It is recommended to create a company-wide CSS to show the company logo, colors, and other visual markings on the dashboard.

So basically all we are doing in this Layout mode is determining where certain contents should appear within the dashboard, and we do that by naming each of the place where we want those contents to be displayed.

NOTE: Even though the contents are placed within a Column, it is a good practice to name the Rows clearly to indicate the sections of the dashboard, so we can go back later and be able to locate the dashboard elements quickly.

Lining-Up Components

After we defined the layout of the dashboard using the Layout mode, we move on to the next step by clicking on the Components button on the top horizontal menu as shown in the screenshot below:

Figure 3. The Components mode where we define the dashboard components

Usage experience: Although more complex, the CDE is well implemented and quite robust. During our usage to build dashboards for our clients, we have never seen it produce inconsistent results.

In this Components mode, there are three sections (going from left to right). The left-most panel contains the selection of components (data presentation unit). Ranging from simple table, to the complex charting options (based on Protovis data visualization library), we can choose how to present the data on the dashboard.

The next section to the right contains the current components already chosen for the dashboard we are building. As we select each of these components, its properties are displayed in the section next to it. The Properties section is where we fill-in the information such as:

  • Where the data is coming from
  • Where the Component will be displayed in the dashboard. This is done by referring to the previously defined Column from the Layout screen
  • Customization such as table column width, the colors of a pie chart, custom scripting that should be run before or after the component is drawn

This clean separation between the Layout and the Components makes it easy for us to create dashboards that are easy to maintain and accommodates different versions of the components.

Where The Data Is Sourced

The last mode is the Data Source mode where we define where the dashboard Components will get their data:

Figure 4. The Data Sources mode where we define where the data is coming from

As seen in the left-most panel, the data source type is quite comprehensive. We typically use either SQL or MDX queries to fetch the data set in the format that is suitable to be presented in the Components we defined earlier.

For instance, a data set to be presented in a five-columns table will look different than one that will be presented in a Pie Chart.

This screen follows the other in terms of sections, we have (from left to right) the Data Source type list, the currently defined data sources, and the Properties section on the right.

Usage experience: There may be some confusion for those who are not familiar with the way Pentaho define a data source. There are two “data source” concepts represented here. One is the Data Source defined in this step for the dashboard, and the other, the “data source” or “data model” where the Data Source connects to and run the query against.

After we define the Data Sources and name them, we go back to the Components mode and specify these names as the value of the Data source property of the defined components.

Voila! A Dashboard

By the time we finished defining the Data Sources, Components, and Layout, we end up with a dashboard. Ours looks like this:

Figure 5. The resulting dashboard

The Title of the dashboard and the date range is contained within one Row. So are the first table and the pie chart. This demonstrates the flexibility of the grid system used in the Layout mode.

The company color and fonts used in this dashboard is controlled via the custom CSS specified as Resource in the Layout mode.

All that is left to do at this point is to give the dashboard some role-based permissions so access to it will be limited to those who are in the specified role.

TIP: Never assign permission at the individual user level. Why? Think about what has to happen when the person change position and is replaced by someone else.

Extreme Customization

Anything from table column width to the rotation-degrees of the x-axis labels can be customized via the properties. Furthermore, for those who are well-versed in Javascript language, there are tons of things that we can do to make the dashboard more than just a static display.

These customizations can actually be useful other than just making things sparkle and easier to read. For example, by using some scripting, we can apply some dashboard-level business rules to the dashboard.

Usage experience:Let's say we wanted to trigger some numbers displayed to be in the red when it fell below a certain threshold, we do this using the post-execution property of the component and the script looks like this:

Figure 6. A sample post-execution script

Summary

The CDE is a good tool for building dashboards, coupled with the ACL feature built into the Pentaho BI Server, they serve as a good platform for planning and delivering your dashboard solutions. Are there other tools out there that can do the same thing with the same degree of flexibility? Sure. But for the cost of only time spent on learning (which can be shortened significantly by hiring a competent BI consultant), it is quite hard to beat free licensing cost.

To squeeze out its potentials, CDE requires a lot of familiarity with programming concepts such as formatting masks, javascript scripting, pre- and post- events, and most of the times, the answer to how-to questions can only be found in random conversations between Pentaho CE developers. So please be duly warned.

But if we can get past those hurdles, it can bring about some of the most useful and clear dashboards. Notice we didn't mention “pretty” (as in “gimicky”) because that is not what makes a dashboard really useful for CEOs and Business Owners in day-to-day decision-making.

Next in the final part (part-six), we will wrap up the review with a peek into the Weka Data Mining facility in Pentaho, and some closing thoughts.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Download our free Pentaho Business Analytics Report and get advice and tips from experienced pros sharing their opinions.
Updated: December 2024
Buyer's Guide
Download our free Pentaho Business Analytics Report and get advice and tips from experienced pros sharing their opinions.