To manage the different data flows from various data sources(including both upstream and downstream) using visual representation. The usecase is to track various user activities from multiple sources within the enterprise.
Owner with 51-200 employees
Pentaho BI Suite Review: Dashboards – Part 5 of 6
Introduction
This is the fifth of a six-part review of the Pentaho BI suite. In each part of the review, we will take a look at the components that make up the BI suite, according to how they would be used in the real world.
In this fifth part, we'll be discussing how to create useful and meaningful dashboards using the tools available to us in the Pentaho BI Suite. As a complete Data Warehouse building tool, Pentaho offers the most important aspect for delivering enterprise-class dashboards, namely Access Control List (ACL). A dashboard-creation tool without this ability to limit dashboards access to a particular group or role within the company is missing a crucial feature, something that we cannot recommend to our clients.
On the Enterprise Edition (EE) version 5.0, dashboard creation has a user-friendly UI that is as simple as drag-and-drop. It looks like this:
Figure 1. The EE version of the Dashboard Designer (CDE in the CE version)
Here the user is guided to choose a type of grid layout that is already prepared by Pentaho. Of course the option to customize the looks and change individual components are available under the hood, but it is clear that this UI is aimed towards end-users looking for quick results. More experienced dashboard designers would feel severely restricted by this.
In the rest of this review, we will go over dashboard creation using the Community Edition (CE) version 4.5. Here we are going to see a more flexible UI which unfortunately also demands familiarity with javacript and chart library customizations to create something more than just basic dashboards.
BI Server Revisited
In the Pentaho BI Suite, dashboards are setup in these two places:
- Using special ETLs we prepare the data to be displayed on the dashboards according to the frequency of update that is required by the user. For example, for daily sales figures, the ETL would be scheduled to run every night. Why do we do this? Because the benefits are two-fold: It increase the performance of the dashboards because it is working with pre-calculated data, and it allows us to apply dashboard-level business rules.
- The BI Server is where we design, edit, assign access permissions to dashboards. Deep URLs could be obtained for a particular dashboard to be displayed on a separate website, but some care has to be taken to go through the Pentaho user authorization; depending on the web server setup, it could be as simple as passing authorization tokens, or as complex as registering and configuring a custom module.
Next, we will discuss each of these steps in creating a dashboard. As usual, the screenshots below are sanitized and there are no real data being represented. Data from a fictitious microbrewery is used to illustrate and relate the concepts.
Ready, Set, Dash!
The first step is to initiate the creation of a dashboard. This is accomplished by selecting File > New > CDE Dashboard. A little background note, CDE (which stands for Ctools Dashboard Editor) is part of the Community Tools (or Ctools) created by the team who maintains and improve Pentaho CE.
After initiating the creation of a new dashboard, this is what we will see:
Figure 2. The Layout screen where we perform the layout step
The first thing to do is to save the newly created (empty) dashboard into somewhere within the Pentaho solution folder (just like what we did when we save an Analytic or Ad-Hoc Reports). To save the currently worked on dashboard, use the familiar New | Save | Save As | Reload | Settings menu. We will not go into details on each of this self-explanatory menus.
Now look at the top-right section. There are three buttons that will toggle the screen mode, this particular one is in the Layout mode.
In this mode, we take care of the layout of the dashboard. On the left panel, we see the Layout Structure. It is basically a grid that is made out of Row entries, which contains Column(s) which itself may contain another set of Row(s). The big difference between Row and Column is that the Column actually contains the Components such as charts, tables, and many other types. We give a name to a Column to tie it to a content. Because of this, the names of the Columns must be unique within a dashboard.
The panel to the right, is a list of properties that we can set the values of, mostly HTML and CSS attributes that tells the browser how to render the layout. It is recommended to create a company-wide CSS to show the company logo, colors, and other visual markings on the dashboard.
So basically all we are doing in this Layout mode is determining where certain contents should appear within the dashboard, and we do that by naming each of the place where we want those contents to be displayed.
NOTE: Even though the contents are placed within a Column, it is a good practice to name the Rows clearly to indicate the sections of the dashboard, so we can go back later and be able to locate the dashboard elements quickly.
Lining-Up Components
After we defined the layout of the dashboard using the Layout mode, we move on to the next step by clicking on the Components button on the top horizontal menu as shown in the screenshot below:
Figure 3. The Components mode where we define the dashboard components
Usage experience: Although more complex, the CDE is well implemented and quite robust. During our usage to build dashboards for our clients, we have never seen it produce inconsistent results.
In this Components mode, there are three sections (going from left to right). The left-most panel contains the selection of components (data presentation unit). Ranging from simple table, to the complex charting options (based on Protovis data visualization library), we can choose how to present the data on the dashboard.
The next section to the right contains the current components already chosen for the dashboard we are building. As we select each of these components, its properties are displayed in the section next to it. The Properties section is where we fill-in the information such as:
- Where the data is coming from
- Where the Component will be displayed in the dashboard. This is done by referring to the previously defined Column from the Layout screen
- Customization such as table column width, the colors of a pie chart, custom scripting that should be run before or after the component is drawn
This clean separation between the Layout and the Components makes it easy for us to create dashboards that are easy to maintain and accommodates different versions of the components.
Where The Data Is Sourced
The last mode is the Data Source mode where we define where the dashboard Components will get their data:
Figure 4. The Data Sources mode where we define where the data is coming from
As seen in the left-most panel, the data source type is quite comprehensive. We typically use either SQL or MDX queries to fetch the data set in the format that is suitable to be presented in the Components we defined earlier.
For instance, a data set to be presented in a five-columns table will look different than one that will be presented in a Pie Chart.
This screen follows the other in terms of sections, we have (from left to right) the Data Source type list, the currently defined data sources, and the Properties section on the right.
Usage experience: There may be some confusion for those who are not familiar with the way Pentaho define a data source. There are two “data source” concepts represented here. One is the Data Source defined in this step for the dashboard, and the other, the “data source” or “data model” where the Data Source connects to and run the query against.
After we define the Data Sources and name them, we go back to the Components mode and specify these names as the value of the Data source property of the defined components.
Voila! A Dashboard
By the time we finished defining the Data Sources, Components, and Layout, we end up with a dashboard. Ours looks like this:
Figure 5. The resulting dashboard
The Title of the dashboard and the date range is contained within one Row. So are the first table and the pie chart. This demonstrates the flexibility of the grid system used in the Layout mode.
The company color and fonts used in this dashboard is controlled via the custom CSS specified as Resource in the Layout mode.
All that is left to do at this point is to give the dashboard some role-based permissions so access to it will be limited to those who are in the specified role.
TIP: Never assign permission at the individual user level. Why? Think about what has to happen when the person change position and is replaced by someone else.
Extreme Customization
Anything from table column width to the rotation-degrees of the x-axis labels can be customized via the properties. Furthermore, for those who are well-versed in Javascript language, there are tons of things that we can do to make the dashboard more than just a static display.
These customizations can actually be useful other than just making things sparkle and easier to read. For example, by using some scripting, we can apply some dashboard-level business rules to the dashboard.
Usage experience:Let's say we wanted to trigger some numbers displayed to be in the red when it fell below a certain threshold, we do this using the post-execution property of the component and the script looks like this:
Figure 6. A sample post-execution script
Summary
The CDE is a good tool for building dashboards, coupled with the ACL feature built into the Pentaho BI Server, they serve as a good platform for planning and delivering your dashboard solutions. Are there other tools out there that can do the same thing with the same degree of flexibility? Sure. But for the cost of only time spent on learning (which can be shortened significantly by hiring a competent BI consultant), it is quite hard to beat free licensing cost.
To squeeze out its potentials, CDE requires a lot of familiarity with programming concepts such as formatting masks, javascript scripting, pre- and post- events, and most of the times, the answer to how-to questions can only be found in random conversations between Pentaho CE developers. So please be duly warned.
But if we can get past those hurdles, it can bring about some of the most useful and clear dashboards. Notice we didn't mention “pretty” (as in “gimicky”) because that is not what makes a dashboard really useful for CEOs and Business Owners in day-to-day decision-making.
Next in the final part (part-six), we will wrap up the review with a peek into the Weka Data Mining facility in Pentaho, and some closing thoughts.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Owner with 51-200 employees
Pentaho BI Suite Review: Pentaho Analytics – Part 4 of 6
Introduction
This is the fourth of a six-part review of the Pentaho BI suite. In each part of the review, we will take a look at the components that make up the BI suite, according to how they would be used in the real world.
In this fourth part, we'll be discussing the Pentaho Analytics tools and facilities, which provides the ability to view, “slice and dice” data from multiple dimensions. This particular feature is the most associated with the word “Business Intelligence” due to its usefulness to aid cross-data-domain decision-making processes. Any decent BI suites have at least one facility with which users can perform data analysis with.
One important note, specifically for Pentaho, the Analytics toolset is where the real advantage of the Enterprise Edition (EE) over Community Edition (CE) starts to show-through – other than the much more polished UI.
In the Pentaho BI Suite, we have these analytics tools:
- Saiku Analytics (In EE this is called “Analysis Report”) – A tool built into Pentaho User Console (PUC) that utilizes the available analysis models. Do not confuse this with the Saiku Reporting.
- Pentaho Model Data Source – In part three of the review, we discussed this facility to create data models for Ad-hoc reporting. The second usage of this facility is to create an OLAP “cube” for use with the Saiku Analytics tool. Once this is setup by the data personnel, data owners can use it to generate analytic reports.
- Schema Workbench – A separate program that allows for handcrafting OLAP cube schemas. Proviciency with MDX query language is not necessary but can come in handy in certain situations.
As usual, we'll discuss each of these components individually. The screenshots below are sanitized and there are no real data being represented. A fictitious company called “DonutWorld” is used to illustrate and relate the concepts.
Saiku Analytics (Analysis Report in EE)
One of the benefit of having a Data Warehouse is to be able to model existing data in a structure that is conducive to analysis. If we try to feed tools such as this with a heavily normalized transaction database, we are inviting two problems:
1. We will be forced to do complex joins which will manifest itself in performance hit and difficulty when business rules change
2. We lose the ability to apply non-transactional business rules to the data which is closer to the rule maintainers (typically those who work closely with the business decision-makers)
Therefore to use this tool effectively we need to be thinking in terms of what questions need to be answered, then work our way backwards employing data personnels to create the suitable model for the said questions. Coincidentally, this process of modeling data suitable for reporting is a big part of building a Data Warehouse.
Learning experience: Those who are familiar with MS Excel (or Libre Office) Pivot Tables will be at home with this tool. Basically, as the model allows, we can design the view or report by assigning dimensions into columns and rows, and then assigning measures to define what kind of numbers we are expecting to see. We will discuss below what 'dimension' and 'measure' mean in this context, but for an in-depth treatment, we recommend consulting your data personnels.
Usage experience: The EE version of this tool has a clearer interface as far as where to drop dimensions and measures, but the CE version is usable once we are accustomed to how it works. Another point for the EE version (version 5.0) is the ability to generate total sums in both row and column direction and a much more usable Excel export.
Figure 1. The EE version of the Analysis Report (Saiku Analytics in CE)
Pentaho Model Data Source
The Data Source facility is accessible from within the PUC. As described in Part 3 of this review, once you have logged in, look for a section on the screen that allows you to create or manage existing data sources.
Here we are focusing on using this feature to setup “cubes” instead of “models.” This is something that your data personnels should be familiar with, guided by the business questions that needs answering.
Unlike the “model”, the “cubes” are not flat, rather it consists of multiple dimensions that determines how the measures are aggregated. Out of these “cubes” non-technical users can create reports by designing it just like they would Pivot Tables. The most useful aspect of this tool is to abstract a construction of an OLAP cube schema to its most core concepts. For example, given a fact table, this tool will try to generate an OLAP cube schema. And in most part, it's doing a good job in the sense that the cube is immediately usable to generate Analysis Reports.
This tool also hide the distinction between Hierarchies and Levels of dimensions. For the most part, you can do a lot with just one Level anyway, so this is easier to grasp for beginners in OLAP schema design.
Learning experience: The data personnel must be 1) familiar with the BI table structures or at the very least can pinpoint which of the tables are facts and dimensions; 2) comfortable with designing OLAP dimensions and measures. Data owners must be familiar with the structure and usage of the data. The combined efforts by these two roles are the building blocks of a workflow/process.
Usage experience: Utilizing the workflow/process defined above, an organization will generate a collection of OLAP cubes that can be used to analyze the business data with increasing accuracy and usefulness. The most important consideration from the business standpoint, is that all of this will take some time to materialize. The incorrect attitude here would be to expect instant results, which will not transpire unless the dataset is overly simplistic.
Figure 2. Creating a model out of a SQL query
NOTE: Again, this is where the maturity level of the Data Warehouse is tested. For example, a DW with sufficient maturity will notify the data personnel of any data model changes which will trigger the updating of the OLAP cube, which may or may not have an effect on the created reports and dashboards.
If the DW is designed correctly, there should be quite a few fact tables that can readily be used in the OLAP cube.
Schema Workbench
The Schema Workbench is for those who needs to create a custom OLAP schema that cannot be generated via the Data Source facility in the PUC. Usually this involves complicated measure definitions, multi-Hierarchy or multi-Level dimensions, or to evaluate and optimize MDX queries.
NOTE: In the 5.0 version of PUC, we can import existing MDX queries into the Data Source Model making it available for the Analysis Report (or Saiku Ad-Hoc report in the CE version). As can be seen in the screenshot below, the program is quite complex with the numerous features to handcraft an OLAP cube schema.
Once a schema is validated in the Workbench, we need to publish it. Using the password defined in the pentaho-solutions/system/publisher_config.xml, the Workbench will prompt for the location of the cube within the BI Server and the displayed name. From that point, it will be available to choose from the drop-down list on the top left of the Saiku Analytics tool.
Figure 3. A Saiku report in progress
OLAP Cube Schema Considerations
Start by defining the fact table (bi_convection in the above example), then start defining dimensions and measures.
We have been talking about these concepts of dimension and measure. Let's briefly define them:
- A dimension is a way to view existing business data. For instance, a single figure such as sales number can be viewed from the perspectives. We can view it per sales regions, per salesperson or department, or chronologically. Using aggregation function such as sum, average, min/max, standard deviation, etc. we can come up with different numbers that shows the data in a manner that we can draw conclusion from.
- A measure is the numbers or counts of business data that can provide an indication on how the business is doing. For a shoe manufacturing company, obviously the number of shoes sold is one very important measure, another would be the average price of sold shoes. Combined with dimensions, we can use the measures to make a business decision.
In the Schema Workbench, as you select the existing BI table fields into the proper dimensions, it will validate the accessibility of the fields using the existing database connection, then create a view of the measures using a certain user-configurable way to aggregate the numbers.
In the creation of an OLAP cube schema, there is a special dimension that enables us to see data chronologically. Due to its universal nature, this dimension is a good one to start with. The time dimension is typically served by a special BI table that contains a flat list of rows containing time and date information within the needed granularity (some businesses requires seconds, others days, or even weeks or months).
TIP: Measures can be defined using “case when” SQL construct, which opens a whole other level of flexibility.
When should we use MDX vs SQL?
The MDX query language, with its powerful concepts like ParallelPeriods, is suitable for generating tabular data containing aggregated data that is useful for comparison purposes.
True to its intended purposes, MDX queries allows for querying data which is presented in a multi-dimensional fashion. While SQL is easier to grasp and has a wider base of users/experts in any industry.
In reality, we use these two languages at different levels, the key is to be comfortable with both, and discover the cases where one would make more sense than the other.
NOTE: The powerful Mondrian engine is capable, but without a judicious use of database indexing, query performance can crawl into minutes instead of seconds easily. This is where data personnels with database tuning experiences would be extremely helpful.
Summary
The analytics tools in the Pentaho BI Suite is quite comprehensive. Certainly better than some of the competing tools out there. The analytic reports are made available on the Pentaho User Console (PUC) where users login and initiate the report generation. There are three facilities available:
The Analysis Report (or Saiku Analytics in CE version) is a good tool for building reports that look into an existing OLAP cube and do the “slicing and dicing” of data.
The Data Source facility can also be used to create OLAP cubes from existing BI tables in the DW. A good use of this facility is to build a collection of OLAP cubes to answer business questions.
The Schema Workbench is a standalone tool which allows for handcrafting custom OLAP cube schemas. This tool is handy for complicated measure definitions and multilevel dimensions. It is also a good MDX query builder and evaluator.
Next in part-five, we will discuss the Pentaho Dashboard design tools.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Pentaho Business Analytics
January 2025
Learn what your peers think about Pentaho Business Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: January 2025.
831,265 professionals have used our research since 2012.
Owner with 51-200 employees
Pentaho BI Suite Review: Pentaho Reporting – Part 3 of 6
This is the third of a six-part review of the Pentaho BI suite. In each part of the review, we will take a look at the components that make up the BI suite, according to how they would be used in the real world.
In this third part, we'll be discussing the tools and facilities, with which all of the reports are designed, generated, and served. A full BI suite should have a few reporting facilities that are usable by users with different level of technical/database knowledge.
Why is this important? Because in the real world, owners of data (people who consume the reports to make various business decisions) ranges from accountants, customer account managers, supply-chain managers, C-level executives, manufacturing managers, etc. Notice that proficiency in writing SQL queries a prerequisite to any of those positions?
In the Pentaho BI Suite, we have these reporting components:
- Pentaho Report Designer – A stand-alone program that are par with Jasper or iReport and to the lesser extent Crystal report designers.
- Pentaho Model Data Source – A way to encapsulate data sources which includes the most flexible of all, a SQL query. Once this is setup by the data personnel, data owners can use it to generate ad-hoc reports – and dashboards too, which we'll discuss in Part 5 of this review series.
- Saiku Reporting Tool – A convenient way to create ad-hoc reports based on the Pentaho Data Sources (see number 2 above).
Let's discuss each of these components individually. The screenshots below are sanitized to remove references to our actual clients. A fictitious company called “DonutWorld” is used to illustrate and relate the concepts.
This Java standalone program feels like the Eclipse Java development IDE because they share the UI library. If you are already familiar with Jasper Reports, iReports, or Crystal Report, the concepts are similar (bands, groups, details, sub-reports). You start with a master report in which you can combine different data sources (SQL and MDX queries in this case) into a layout that is managed via a set of properties.
Learning experience: As with any report designers, which are complex software because of the sheer number of tweak-able properties governing each element of the reports, one has to be prepared to learn the PRD. While the tools are laid out logically, it will take some time for a new personnel to absorb the main concepts. The sub-report facility is one of the most powerful feature of this program and it is the key to create reports that drills into more than one axis (or dimension) of data.
Usage experience: Things like the placement accuracy of elements within the page is not 100% precise and there are times when I had to work around the quirks and inconsistencies revolving around setting default values for properties, especially the ones containing formulas. Be prepared to have a dedicated personnel (either a permanent employee or a consultant) that can be reached for report designs *and* subsequent modifications. In addition, aesthetic considerations are also important in order to create a visually engaging reports (who wants to read a boring and bland report?).
Figure 1. The typical look of PRD when designing a report.
The Data Source facility is accessible from within the Pentaho BI Server UI (the PUC, see Part 2 of this review series for more information). Once you have logged in, look for a section on the screen that allows you to create or manage existing data sources.
This feature allows data personnel to setup “models” that can be constructed from various data sources, that represents a flat-view of data, of which a non-technical data owners can create ad-hoc reports or dashboards. Obviously this feature does not alleviate the need for knowing how to use the available tools for creating those reports and dashboards. It simply detach the dependency on crafting SQL/MDX queries and the intricacies of OLAP data structures from creating an ad-hoc report.
Learning experience: A data personnel who are familiar with the Data Warehouse (DW) can easily create models out of SQL queries against existing tables within the DW, or by using MDX queries against existing OLAP cubes. Data owners who are familiar with the data itself, can then start to use the Saiku Ad hoc Reporting tool or the CDE (Community-tools Dashboard Editor) to create dashboards. In reality, expect a couple of weeks for the personnels to get accustomed to this feature. Assumption: A knowledgeable BI teacher or consultant is available during this time. Usage experience: By separating the technical-database skill from the ability to generate ad-hoc reports, Pentaho has provided a way for organizations to streamline their business decision-making process further away from the technical minutiae that tends to bog down the process with details that are not relevant to the business goals. I highly rate this feature in the Pentaho BI Suite as one of the more innovative contribution to the area of Business Process Management.
Figure 2. Creating a model out of a SQL query
NOTE: The most important part of using this facility has to do more with business process than the familiarity of the data itself. Without a good process in place, it is quite obvious that the reports can get out of sync with the underlying data model. This is where the construction and maturity of the Data Warehouse is tested. For example, a DW with sufficient maturity will notify the data personnel of any data model changes which will trigger the updating of the Model Data Structure, which may or may not have an effect on the ad-hoc reports.
If the DW is designed correctly, there should be quite a few fact tables that can readily be translated into a Model Data Source. This is the first step. Now let's look at how to use this model.
Saiku is the name of two tools available from the PUC. The first one is the Saiku Analytics tool which allows us drill into an OLAP cube and perform analysis using aggregated measures (we'll review this in Part 4). The second one is the Saiku Ad-hoc Reporting tool. This is the one we are going to look into at this time. Using the modern UI library such as jQuery, the developers of Saiku give us a convenient drag-and-drop UI that is easy to learn and use.
Once a model is published, it will be available to choose from the drop-down list on the top left of the Saiku Ad-hoc Reporting tool. See the screenshot below:
Figure 3. A Saiku report in progress
Next, you can start to choose from the list of available fields in the model to specify as part of either the Columns list, or Groups list. Next, from the same list of available fields, you can specify some values as filters. The most obvious example would be the transaction date and time range which determines what period is the report for.
As you select the fields into the proper report elements, the tool started to populate the preview area with what the report would look like. You can also specify aggregation for each of the groupings, which is very handy.
There is a limited control on templates which governs the appearance of the report, but obviously won't be enough for serious usages. The best remedy however, is available, via the exporting to .prpt file, which you can open in the PRD and tweak to your heart's content.
After you are happy with the report, you can save it for later editing. Another thoughtful design decision by the Pentaho team.
In overall, the Saiku Ad-hoc Reporting tool is a handy facility to craft quick reports that answer specific questions based on the available model data sources. If your data personnel diligently updates and maintains the models, this tool can be invaluable to support your business decisions.
None of the above discussions would mean a whole lot without a practical and useful way for the reports to be delivered to its requesters. Here, the comprehensive nature of the Pentaho BI Suite helps by providing the facilities like xaction and input UI controls for report parameters.
For example a report designed in PRD can be published on the PUC. At some point it is opened by the user on the PUC who supplies the necessary parameters, then the xaction script fire an ETL which renders a .prpt file into a .pdf and either email it to the requester or drop it in a shared folder.
Reports can also be “burst” via an ETL script that utilizes the Pentaho Reporting Output step available from within Spoon (the ETL editor). I have used this method to distribute periodically-generated reports to different recipients containing data that is specific to the said recipient's access permission level. This saves a lot of time and increased the efficiency of up-to-date information distribution inside a company.
The reporting tools in the Pentaho BI Suite is designed to allow different users within the company to generate reports that are either pre-designed or ad-hoc. The reports are made available on the Pentaho User Console (PUC) where users login and initiate the report generation. Reports can also be scheduled to be generated via ETL scripts.
The PRD will be instantly recognizable by anyone who has experience using tools like Crystal Reports and its derivatives. You can also specify MDX queries against any OLAP cube schema published in the Pentaho BI Server as a data source.
The Model Data Source facility allows data owners who are not data personnels to create ad-hoc reports quickly and save it for future use and modifications.
The Saiku Ad-Hoc report is the UI with which available models can be used to generate reports on-the-fly. These reports can also be saved for later use.
Next in part-four, we will discuss the Pentaho Mondrian (MDX query engine) and the OLAP Cube Schema tools.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Owner with 51-200 employees
Pentaho BI Suite Review: Pentaho BI Server – Part 2 of 6
Introduction
This is the second of a six-part review of the Pentaho BI suite. In each part of the review, we will take a look at the components that make up the BI suite, according to how they would be used in the real world.
In this second part, we'll be discussing the Pentaho BI Server from which all of the reports, dashboards, and analytic tools are served to the users. A BI suite usually has a central place where users log in using their assigned credentials. In this case, the server is a standalone web server (an Apache Tomcat instance) that is augmented by various tools that provides the functionalities – most of these tools are written by Webdetails (webdetails.pt). We'll visit these tools in subsequent review parts, for now, let's focus on the server itself.
In the case of Pentaho BI Server, it has two components:
-
The Pentaho User Console (a.k.a PUC) – this is what we usually associate with the main BI Server in the Pentaho world; where users would spend the majority of their time generating reports (both real-time or scheduled), using the analytic tools, build and publish dashboards, etc. This is also where administrator users can manage who can access which reports either by User or by Role – obviously, Role-based ACL is cleaner and easier to maintain.
-
The Administration Console (a.k.a PAC) – this is where admin users go to create new Users, Roles, and schedule jobs. It is another standalone web server that can be started and stopped when needed, it is totally independent of the main PUC server.
Is it Corporate-Ready?
BI servers are considered ready for corporate “demands” based on the number of users they can support, and the facilities to manage them. The Pentaho BI Suite Enterprise Edition is without a doubt ready for corporate use because it comes with the support that will make sure that is the case.
The Community Edition is more interesting, it is definitely corporate ready, but the personnels who set it up needs to be intimately familiar with the ins and outs of the server itself. Having installed three of these, I am confident that the BI Server, due to its built in ACL management is ready for prime time in the corporate world.
Although the Pentaho BI server includes a scheduler, another “corporate” feature, I find myself using cron (or Windows Task Scheduler) for the most part. The built-in scheduler is based on the Quartz library for Java. It is a good facility with decent UI to schedule reports or ETL from within the PUC.
Is it Easy to Use?
The PAC is very easy to use. The UI interface is simple enough due to the minimum numbers of menus and options. In a sense, it's a simple facility to manage user/role and scheduling – not ACL, just users and roles.
The PUC is more involved, but adopting the familiar file folder look and feel on the left panel, it is quite easy to get into and start using. Administrators would love the way they can set who can Execute, Edit, Schedule each reports, saved analytic views, and dashboards – by the way, Pentaho calls these: Solutions.
Setting up the BI server is better left to the consultants who are used to doing it. Or if there are in-house personnels who would be doing this, it is worth the time to participate in the training webinars that Pentaho held periodically. The steps to setup a BI server far from being simple, but that is the case for all BI servers, regardless the brand.
The collapsible left panel serves as the directory of the solutions, with the top part shows the folders, and the bottom part shows the individual solution. The bigger panel on the right is where you actually see the content of the solutions. And in some cases, that's where you'd create a Dashboard using the CDE tool (we'll revisit this in later review part).
Is it Easy to Create Solutions?
Remember that the concept “solution” here refer to the different types of reports, dashboards, analytic views. Pentaho BI server employs a “glue” scripting facility called the xactions. These are XML documents that contain some sequence of actions that can do various things like:
-
Asking users for input parameters
-
Issuing a SQL query based on user input
-
Trigger an ETL that produce reports
Once you are familiar with this facility, it is not that hard to start producing solutions, but it pays to install the included examples and study them to find out how to do certain things with xaction and/or to copy snippets into your own scripts.
On the PUC, we can build these solutions:
-
Dashboards using CDE
-
Ad-hoc reports and data model using the built in Model generator (very handy for accessing those BI tables that are populated by ETL runs)
-
Analytic Views using tools like Saiku or its equivalent for the Professional and Enterprise edition. NOTE: This requires a pre-published schema which is built using another tool called the schema-workbench (we will see this in the latter parts of this review series)
Is it Customizable?
Being the user-facing tool, one of the requirement would be the ability to customize the appearance via themes, at the very least, a BI server need to allow companies to change the logo into their own.
The good news is, you can do all that with Pentaho BI Server. If you opt for the Professional and Enterprise editions, you can rely on the support that you already paid for. For those using the Community Edition, customizing the appearance requires knowledge on how a typical Java Web Server is structured. Again, any good BI consultant should be able to tackle this without too much difficulties.
Here is an example of a customized PUC login page:
In case you are wondering, yes, you can customize the PUC interface also, and it even comes with a theme structure in which you can assign your graphic artists to redefine the CSS elements.
Summary
The Pentaho BI server, is the central place where users are going to interact with Pentaho BI Suite. It brings together solutions (what Pentaho call contents) produced by the other tools in the suite, and expose it to the user while being protected by a robust ACL.
On the balance between ease-of-use and the ability to customize, the Pentaho BI Server scores well provided that the personnel in charge is familiar with the Java Enterprise environment. To illustrate this, in one project, I managed to tweak the security framework to make the PUC part of a single-sign-on Liferay portal, along with other applications such as Opentaps and Alfresco.
Next in part-three, we will discuss the wide array of Pentaho Reporting tools.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Solution Architect at MIMOS Berhad
Pentaho is best fit for small and medium enterprises where the quantity of data and the workload are not critical.
What is our primary use case?
How has it helped my organization?
Making business decision easily and efficient by simplifying the Performance Management of each business units on timely basis.
What is most valuable?
Ad-hoc reporting (available only in Enterprise Edition) GEO-referencing using Google Maps ETL (Pentaho Data Integration) Support for both JasperReport, BIRT
What needs improvement?
QueryByExample Portal support for Liferay No collaborative BI Auditing / User-Profiling is available only in Enterprise edition
For how long have I used the solution?
7 years
What do I think about the stability of the solution?
Very rarely
What do I think about the scalability of the solution?
No
How are customer service and technical support?
Customer Service:
3 (out of 5 scale)
Technical Support:
3 (out of 5 scale)
Which solution did I use previously and why did I switch?
No
How was the initial setup?
It is easy
What about the implementation team?
In-House
What's my experience with pricing, setup cost, and licensing?
Completely using community edition along with features implemented on top.
Which other solutions did I evaluate?
No, but now evaluating both SpagoBI and Jasper
What other advice do I have?
Study, Analyze and Compare with other platforms feature according to your requirements. Following are the nice features of Pentaho: Ad-hoc reporting Support for both JasperReport,BIRT Dashboard using JFreeChart
Which deployment model are you using for this solution?
On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Director Tecnologia
Increases productivity and lowers costs, though should improve the construction of its dashboards
Pros and Cons
- "I use the BI Server, CDE Dashboards, Saiku, and Kettle, because these tools are very good and highly experienced."
- "Pentaho, at the general level, should greatly improve the easy construction of its dashboards and easy integration of information from different sources without technical user intervention."
What is most valuable?
I use the BI Server, CDE Dashboards, Saiku, and Kettle, because these tools are very good and highly experienced.
How has it helped my organization?
The first eight years, I used this tool in one company. Now, I have some customers who hire me to give them advice. I have a couple of great customers in my country and they are very satisfied because they have increased productivity and lowered costs.
What needs improvement?
Pentaho, at the general level, should greatly improve the easy construction of its dashboards and easy integration of information from different sources without technical user intervention.
For how long have I used the solution?
For 12 years. I have been using Pentaho CE 6.0 and 7.0. Last year, I implemented Pentaho CE 5.0.
What do I think about the stability of the solution?
I am actually trying to use Pentaho 7.0 CE and determine if it has some issues. In Pentaho EE, I have several years using it without having issues.
What do I think about the scalability of the solution?
No, it is a highly experienced tool. It can do anything.
How is customer service and technical support?
Really, I don't know about the support of Pentaho EE. As for the support of Pentaho CE, it is bad. Fortunately, I am highly experienced and use it very little.
How was the initial setup?
To start, the first configurations were very difficult. I started with the CE version and without good documentation or support. I spent years learning for myself.
What other advice do I have?
Hire specialized support for Pentaho. If customers want a professional tool and have the money, they should invest in the enterprise version of Pentaho or hire a company from your country specializing in Pentaho with high experience.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Identity and Access Management Engineer at a financial services firm with 10,001+ employees
Easy to install, easy to use, the free edition meets our needs
Pros and Cons
- "Easy to use components to create the job."
- "Logging capability is needed."
- "Version control would be a good addition."
What is most valuable?
Easy to use components to create the job.
What needs improvement?
- Logging capability.
- Version control would be a good addition.
For how long have I used the solution?
One to three years.
What do I think about the stability of the solution?
A lot of time jobs get stuck, causing them to lock out and fail to run, until we kill them.
What do I think about the scalability of the solution?
I have not needed to scale this product so far.
How are customer service and technical support?
Open community, you can find good responses at a high level for the free edition. I have not used the commercial version which includes support.
Which solution did I use previously and why did I switch?
This is first solution I have used and I like it.
How was the initial setup?
Simple, easy to install.
What's my experience with pricing, setup cost, and licensing?
Free and commercial versions are available.
Which other solutions did I evaluate?
We did not evaluate other options as this one is free.
What other advice do I have?
Good for any size organization. There are other products and vendors available to better handle errors and logging, but for us, the free version of Pentaho is good enough to satisfy our needs.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Data Scientist at a tech services company with 501-1,000 employees
It became a lot easier for our developers to switch between or join the different development projects.
What is most valuable?
I found Pentaho Data Integration the most valuable component since it is the most mature open-source ETL tool available. Compared to other proprietary products it has a less steep learning curve due to it's very intuitive user interface. Besides that it has a pluggable architecture which makes it quite easy to extend with custom functionality and features.
Another thing worth mentioning is the very active user community around the products which provide some great resources for community support.
How has it helped my organization?
As for the data integration part each development team were writing their own integration scripts, parsers and interfaces from scratch on each different project over and over again. With Pentaho Data Integration which offers all these common tasks out-of-the-box we reduced development time significantly. Also by using such a universal tool and introducing a uniform architecture it became a lot easier for our developers to switch and/or join between the different development projects.
Also on the business intelligence part we moved from developing custom solutions on each track to the usage of standard functionality of the BI server and thus cutting down both complexity and development time.
What needs improvement?
Since most of our projects start off as a proof-of-concept with the Community Edition version of the products we found that the differences between the Community- and the Enterprise Editions are too big on certain levels. It would be a big gain if the Community Edition version would be a full representation of the Enterprise Editions making it easier to move on to the Enterprise Edition and support.
For how long have I used the solution?
I started using Pentaho Data Integration around seven years ago and moved on to the full stack about five years ago.
What was my experience with deployment of the solution?
I have seen many different (custom build) deployment solutions for Pentaho throughout the years each having their own pros and cons.
What do I think about the stability of the solution?
We've had no issues with its stability.
What do I think about the scalability of the solution?
Since Pentaho supports running as a single process to a clustered architecture and has a big focus on big data (distributed) environments, scalability hasn't been an issue for us.
How are customer service and technical support?
The open source strategy of Pentaho has resulted in a very active community which provided us all the support we need. Compared to other big vendors my personal experience is that response times are a lot shorter.
Which solution did I use previously and why did I switch?
Most of our previously used solutions were custom built. We have evaluated both open-source and proprietary competitive products but found that Pentaho was the easiest to adopt.
How was the initial setup?
Depending upon the solutions nature, the initial setup for a basic data warehouse architecture is quite straightforward. But as with all solutions as the landscape grows and user requirements evolve, the complexity increases. I think that Pentaho suits well in today's demand for a continuous integration approach. With this in mind the initial setup is crucial in a way not to find yourself spending a lot of time and effort in refactoring the complete solution over-and-over again.
What about the implementation team?
We implemented it in-house. Keep your development and implementation cycles short and small if possible. Users demand fast implementation of requirements so the continuous integration approach becomes more crucial as well as self-service functionality. From which the latter is not yet the strongest use-case for using Pentaho yet.
What was our ROI?
Decrease of development time compared to our traditional development cycles in pure Enterprise JAVA solutions should be estimated around 60%.
What's my experience with pricing, setup cost, and licensing?
Unfortunately I can't provide any exact figures about this. But using the Community edition for the development and test cycles drops down the licensing costs for the complete OTAP street.
What other advice do I have?
As mentioned before, there is a great community of users, developers and other enthusiasts which I recommend to consult for your particular use-case. Check the latest Gartner report (2016) about BI vendors and ultimately visit one of the Pentaho Community Meetups to get more insight.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Download our free Pentaho Business Analytics Report and get advice and tips from experienced pros
sharing their opinions.
Updated: January 2025
Popular Comparisons
Microsoft Power BI
IBM Cognos
SAP BusinessObjects Business Intelligence Platform
Oracle OBIEE
MicroStrategy
Oracle Analytics Cloud
Buyer's Guide
Download our free Pentaho Business Analytics Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:
- Seeking advise on going with Birst, BOARD or Pentaho as an OEM platform solution where we could end up with 1,000's of users over time.
- Jaspersoft vs. Pentaho. Which should we choose?
- Performance benchmarks for Pentaho?
- What is the biggest difference between SSIS and Pentaho?
- When evaluating Business Intelligence Tools, what aspect do you think is the most important to look for?
- BI Tool Replacements, What Do You Recommend?
- Which one is best for ETL - Pentaho or Jaspersoft?
- Seeking advise on going with Birst, BOARD or Pentaho as an OEM platform solution where we could end up with 1,000's of users over time.
- BI Tool Evaluation Criteria Rating Matrix -- anyone have one they've used in making a tool selection?
- QlikView or Tableau - Which is better?
The recent version of pentaho supports various integration such as Big data tools, SAP etc., for efficiently create/schedule/run the transformation/job in standalone/cluster environment.