Try our new research platform with insights from 80,000+ expert users
it_user382572 - PeerSpot reviewer
Pentaho Consultant at a comms service provider with 10,001+ employees
Vendor
It is an open source product it is very easy to build your own solution against it.

What is most valuable?

It is a very good open source ETL tool that's capable of connecting to most databases. It has a lot of functions that makes transforming the data very easy. Also, because it is an open source product, it is very easy to build your own solution with it.

How has it helped my organization?

It is also possible to build a new solution quit quick so the customer sees results quite fast.

What needs improvement?

In the community version the scheduling tool is not good, and we had to build it ourselves.

For how long have I used the solution?

I have worked with different versions of Pentaho since 2009.

Buyer's Guide
Pentaho Data Integration and Analytics
December 2024
Learn what your peers think about Pentaho Data Integration and Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: December 2024.
825,399 professionals have used our research since 2012.

What was my experience with deployment of the solution?

There are a couple of bugs in the newer versions. We were forced to wait until those bugs were fixed before we could upgrade.

What do I think about the stability of the solution?

There were no issues with its stability.

What do I think about the scalability of the solution?

There have been no issues scaling it.

How are customer service and support?

Because we use the community edition, there is no support from the vendor. When I worked with the Enterprise edition last year the technical support was quick and to the point. I was more than happy with their knowledge.

Which solution did I use previously and why did I switch?

In the past I also worked with SAP BI. The main reason we switched to Pentaho was the cost of SAP. Because of the flexibility of Pentaho, I prefer to work with it.

How was the initial setup?

When I started using Pentaho in 2009 the initial setup was quit complex, mainly because of a lack of good documentation at that time. Since then, it has dramatically improved. Also the community on the web is quit active and there are some good blogs.

What about the implementation team?

I was hired to do the implementation. I think it is necessary to have a good understanding of the product to implement is well so I would recommend, when not in-house, to hire the appropriate knowledge

What other advice do I have?

When you don’t have the knowledge of the product I would recommend to follow some courses in to speed up the learning curve. A cheap way to start with Pentaho is using the Community Edition. You can do almost everything with it and the purchase of the Enterprise Edition is not necessary

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
PeerSpot user
Data Developer at a tech services company with 10,001+ employees
Consultant
It is possible to understand how to develop an ETL solution even when using it for the first time.

What is most valuable?

  • Pentaho Kettle has a very intuitive and easy to use graphical user interface (GUI)
  • It is possible to understand how to develop an ETL solution even when using it for the first time
  • The Community Edition is free and very efficient
  • They have versions for Windows, Linux and Mac
  • Large selection of options.

How has it helped my organization?

We have developed some complex ETL processes for some clients and they are very satisfied with the results.

What needs improvement?

They could improve the logging generator. Sometimes the error description is so generic that it is not possible to detect the problem.

For how long have I used the solution?

We've used it for three years.

What was my experience with deployment of the solution?

There were no issues with the deployment.

What do I think about the stability of the solution?

There were no issues with the stability.

What do I think about the scalability of the solution?

There have been no issues scaling it.

How are customer service and technical support?

I use the Community Edition without support or customer service. I recommend the Pentaho Community Forums for technical issues.

Which solution did I use previously and why did I switch?

I have used Informatica PowerCenter, which is an excellent solution. However, it´s not so easy to use as Pentaho kettle.

How was the initial setup?

The initial setup is straightforward. All you need to do is to download it, unzip the file into a folder and execute the Spoon.bat (for Windows) or Spoon.sh (for Linux) to start the graphical user interface (GUI).

What about the implementation team?

In-house. The implementation is very simple. Data developers will not encounter difficulties to implement ETL solutions.

What's my experience with pricing, setup cost, and licensing?

The community edition is free. If you need a full BI solution, I would recommend the enterprise edition.

What other advice do I have?

Pentaho Kettle is an excellent solution to implement ETL process.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Pentaho Data Integration and Analytics
December 2024
Learn what your peers think about Pentaho Data Integration and Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: December 2024.
825,399 professionals have used our research since 2012.
it_user402600 - PeerSpot reviewer
Senior Consultant at a financial services firm with 10,001+ employees
Real User
Needs improvement on the Hadoop and JMS plugins.

Valuable Features:

It allows for rapid prototyping of a wide array of ETL workloads.

Room for Improvement:

Support for common Hadoop utilities can be expanded, such as bulk load with composite row keys for HBase, and include drivers for Impala out-of-the-box. A richer interface to Hive could also be beneficial as we currently have to go through a raw connection and execute SQL scripts, for which some syntax is not respected.

As of version 6, there are also some new issues introduced that pose a bit of an annoyance:


1) On kettle's ramp up - log4j errors

2) IBM Websphere MQ Producer - variable substitution for the URL does not work - you have to hardcode.

3) shared.xml for DB connections - variable substitution for connection properties does not work - have to hardcode things like Kerberos principal for a Hive/Impala connection.

Deployment Issues:

We had no issues deploying it.

Scalability Issues:

The robustness of this solution in a production cluster (>30 nodes) remains to be seen.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
it_user373128 - PeerSpot reviewer
Data Architect & ETL Lead at a financial services firm with 1,001-5,000 employees
Vendor
It doesn't have the capability to produce crosstab reports with formatting capabilities. It connects seamlessly to most commonly used data sources.​​

Valuable Features

It is a lightweight ETL tool that's easy to get started on. It connects seamlessly to most commonly used data sources.

Improvements to My Organization

The organization went with Pentaho ETL and Reporting solutions as cost effective products, as compared to competitors. The ETL part certainly met those objectives, along with serving the purpose.

Room for Improvement

Since there have already been newer versions, maybe some of these features are already fixed now. The most troublesome missing feature was the capability to produce crosstab reports with formatting capabilities in the BI Reporting product. The one annoyance that troubled us a lot was the fact that every step in a transformation that needed data, created its own data connection. With some data sources like Greenplum, this was a problem, because they have a limit on available number of connections.

Use of Solution

I used it for three years, from 2012 to 2015, and only stopped as I left the organization.

Deployment Issues

One issue with encountered constantly with PDI deployments was that the environment parameters for jobs had to be updated manually through the designer module 'Spoon'. Although the product has a feature of keeping Environment Variables outside Spoon, that didn't work for us, as we had one Development server used for Dev, QA and UAT.

Stability Issues

There were no issues with the stability.

Scalability Issues

We had no issues scaling it across the company as needed.

Customer Service and Technical Support

It's about average. Most of the help we got was through Google searches and Wiki pages. One time we had an issue with a feature - our version of PDI could not handle microseconds. The product owner came up with a solution, but instead of applying the patch, wanted to sell it to us for a fee.

Initial Setup

I am only aware of the client side setup which was simple enough. It was pretty much a one step installation process.

Implementation Team

It was done by an in-house team. A couple of issues we realized later were regarding memory configuration for the environment. This needs to be evaluated and fine tuned otherwise you can run into job failures with large amount of data. We ran into this issue with 'Commit' points and 'Sort' steps.

Other Solutions Considered

There was an evaluation performed, however I was not involved in it.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
PeerSpot user
Graduate Teaching Assistant with 1,001-5,000 employees
Vendor
We can perform transformations with data very quickly, and create reports indicating the KPI in the reporting tool.

Valuable Features:

The most valuable feature is that it can take inputs from all formats, e.g. CSV, text, Excel, JSON, Hadoop, etc. It has the potential to provide the output in the format we require, and we can also use many database connections. The transformations listed are also very useful and are very self-explanatory. 

Also, the data mining feature which comes with the Pentaho business analytics suite was very useful to our project, especially the Weka plugin. We could score the records in the data warehouse, which helped in predicting the values.

Lastly, the GUI is very easy to use, so we can perform transformations with data very quickly, and create reports indicating the KPI in the reporting tool. I think that a company wouldn't need to spend more money on getting an experienced person to use this tool. All you need is a balance of experienced users and new trainees to get going. You can also start using the business analytics tool once you have integrated data. Coaching and  applying this technology enterprise wide will enable your business to take data driven decisions.

Improvements to My Organization:

It makes it possible for the seniors to train new employees and junior staff very quickly. All that is needed is strong knowledge of ETL and BI/Big Data concepts to use this software.

Room for Improvement:

I would like to see the data visualization tool combined with BI so I can see how data is progressing through various stages. I do think that they are working on this already. I also found, in my case, that the statistical data input wasn't working (.sas7bdat input wasn't working).

Deployment Issues:

There have been no issues with the deployment.

Stability Issues:

It could have been the case that I may not have been doing it the right way.

Scalability Issues:

We have had no issues scaling it.

Cost and Licensing Advice:

I would say it is one of the most affordable tools to use for business intelligence.

Other Advice:

You should go for this tool to manage your data warehouse, but I would suggest that you look for other reporting tools, such as Tableau, which are more user friendly and provide great insights in the data.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer1384743 - PeerSpot reviewer
Specialist in Relational Databases and Nosql at a computer software company with 5,001-10,000 employees
Real User
Free to use, easy to set up, and has a great metadata injection feature
Pros and Cons
  • "The solution has a free to use community version."
  • "It's not very stable, at least not in the case of the community edition. I'm working with the community edition right now and I think perhaps it is because of that it is not very stable, it causes the system to sometimes hang. I'm not sure if this is the case for pair tiers."

What is our primary use case?

The most common use for the solution is gathering data from our databases or files in order to gather them into a different database. Another common use is to compare data between different databases. Due to a lack of integrity, you can attach these to synchronization issues.

What is most valuable?

One important feature, in my opinion, is the Metadata Injection. It gives flexibility to the scripts due to the fact that the scripts don't depend on a fixed structure or a fixed data model. Instead, you can develop transformations that are not dependant on the fixed structure or data models. 

Let me give a pair of examples. Sometimes your tables change, adding fields or dropping some of them. When this happens if you have a transformation without using Metadata Injection your transformation fails or doesn't manage the whole info from the table. If you use Metadata Injection instead, the new fields are included and the dropped columns are excluded from the transformation. Other times you have a complex transformation to apply to a lot of different tables. Traditionally, without the Metadata Injection feature, you had to repeat the transformation for each table, adapting the transformation to the concrete structure of each table. Fortunately, with the Metadata Injection, the same transformation is valid for all the tables you want to treat. A little bit effort gives you a great benefit.

Furthermore, the solution has a free to use community version.

The solution is easy to set up, very intuitive, clear to understand and easy to maintain.

What needs improvement?

I'm currently looking at a new competitor that's got some interesting features that this solution doesn't have. I have found this competitor has a feature braking system that is not present in the Pentaho Data Integration approach. The way their system sets can somehow maintain a track for the last executions and store the state which gives you the potential to run from the point that it ended the last time. It's very interesting. It would be nice if Pentaho had this type of feature.

Often you are required to install plugins. If you need to have access to, in my case, Neo4j databases new folder databases, you do need a plugin to do it.

For how long have I used the solution?

Between my current role and the role at my last company, I've been working with the solution for over five years.

What do I think about the stability of the solution?

It's not very stable, at least not in the case of the community edition. I'm working with the community edition right now and I think perhaps it is because of that it is not very stable, it causes the system to sometimes hang. I'm not sure if this is the case for pair tiers.

What do I think about the scalability of the solution?

I am the only person using the solution currently. There are two other people that occasionally also assist in it. I'm helping them understand the tool and they are beginning to use it. In that sense, we're slowly scaling.

I don't know if the solution scales well on a large scale, however.

It scales very well, overall with the very useful feature to run n copies to Start attribute in every step, perhaps balancing with the side effect of consuming a lot of memory and CPU resources.

How are customer service and support?

We haven't really contacted technical support in the past. We try to handle any issues ourselves in-house. I can't speak to the quality of the technical support, having never directly dealt with them.

Which solution did I use previously and why did I switch?

We've never really used another solution like this in our organization. This is the first.

How was the initial setup?

The solution is pretty simple to set up. It's not complex.

For our, deployment took about one month.

Maintenance is easy. The only maintenance tasks are to upgrade to the newer versions and backing up the repository frequently.

What about the implementation team?

I handled the implementation on my own. I didn't need any help from a reseller or consultant.

What's my experience with pricing, setup cost, and licensing?

We're using the community edition, which is free to use. I'm not sure how much their paid services cost. We haven't purchased any licensing.

What other advice do I have?

We're just users of the solution. We don't have a professional relationship with the company.

The solution is great to use and easy to share with teams via the central repository. It's very functional overall. I'd recommend the solution to other companies.

I'd rate the solution eight out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
PeerSpot user
Business Intelligence Consultant at Sanmargar Team
Vendor
​We use it almost everywhere, for creating data marts, data warehouses, and implementing BI reporting tools.

Valuable Features:

First of all, the ease of deployment. I’m pretty sure that almost anyone could do simple transformations without having any knowledge of  IT. Thanks to its graphical interface this tool is just drag and click. Another advantage, is that it fits everywhere. You can connect it to Big Data sources, relational databases, and all types of files. If the developer missed something, you can try finding it in the marketplace or quickly develop it yourself, because it is opensource. 

Improvements to My Organization:

We use it almost everywhere, for creating data marts, data warehouses, and implementing BI reporting tools. We also build our Customer Centralized File and Data Quality Studio using it. What’s more, we use it for small solutions too, i.e. if we want to quickly export data from database to .xlsx. We also develop our own plugins for PDI and put them into the marketplace. 

Room for Improvement:

A big advantage, but also a problem, is that it is open source. Almost anyone can develop their own Pentaho code and release it. Now, Pentaho is a little messy, and some parts of it are super new and some look like it were developed at the beging. I think that developers should stop inventing new parts of it, and it can take a while to clean the code and optimize the older parts of it. Some old plugins, after a long time, still doesn’t work properly enough.

Use of Solution:

I've been using it for four years, and when I started using it I was in college. I quickly found that PDI with my text search analytic plug-in is useful for preparing notes for classes. When I was bored I came up with a funny tool. It was collecting data from all my roommates about what they need from shop and it was sending notifications to peoples phones who were going to the shop.

Deployment Issues:

We have never had any problems with deployment.

Stability Issues:

There are some with stability. As I said before there are some small bugs but it’s Pentaho you can always find workaround for it.

Scalability Issues:

With the Pentaho Community version you just download it, unpack, and it should be running. If not you should also install Java. 

Customer Service:

Customer service isn’t needed. Every problem solution is on the internet. If not,  you can post it to community forum and you will get an immediate answer, but I have never had to post a new topic.

Initial Setup:

Straightforward. You just need to unzip file and you can already run it. There is also some setup if you need. It’s very simple you just need to edit three files in notepad. 

Implementation Team:

I did this myself and we do it for other companies. All installations are easy, and you do not need to be an IT magician. 

Cost and Licensing Advice:

There is a Community Edition which is free. There is also an Enterprise licence but the price varies depending on the server hardware configuration and the purpose of use (BigData, Hadoop, etc.).

Other Solutions Considered:

I had the chance to test SAS Data Integration but I didn’t fall in love with it like I did with PDI. I think that PDI is easier to use and you can do much more with PDI than with SAS.

Other Advice:

The tool is excellent, and almost everyone can use it. You just need to take it out of the box and run. There is no limit to the application – you can do everything with it. However, it still has a lot of faults. Not every component runs as you wish to. Always look for solutions on the Internet. There are many problems and build transformations/jobs that are already fixed. 

Disclosure: My company has a business relationship with this vendor other than being a customer: Company where I work Sanmargar Team is a reseller of this solution and a Pentaho partner in Poland.
PeerSpot user
IT-Services Manager & Solution Architect at Stratis
Real User
Free to use, easy to set up, and has great UI
Pros and Cons
  • "It's my understanding that the product can scale."
  • "The product needs more plugins."

What is our primary use case?

We basically receive information from our clients via Excel. We take this information and transform it in order to create some data marks.

With this information, on these processes we are running right now, we receive new data every day. The solution processes the Excels and creates a data mark for them.

While we read the data and transform it as well as put it in a database, in order to explore the information, we need an analytics solution for that - and that is typically Microsoft's solution, Power BI.

What is most valuable?

Running itself with the ETL was very fast. It makes it so that it is very easy to transform the information we have. We found that very useful. 

The UI is very easy to understand and learn.

The solution offers lots of documentation.

The initial setup is easy.

It's my understanding that the product can scale.

We've found the solution to be stable. 

The product is free to use if you choose the free version.

What needs improvement?

The solution needs better, higher-quality documentation, similar to AWS. Right now, we find that although documentation exists, it's not easy to find the answers we seek.

I have tried some cloud services with the ETL, so perhaps that would be good to add.

The product needs more plugins. Right now, it just has a standard database connection and there are other solutions there that can have straightforward connections for Oracle, MySQL, and stuff like that. However, more plugins would make it a much better product.

For how long have I used the solution?

We recently finished two projects with Pentaho.

What do I think about the stability of the solution?

The product is stable. There are no bugs or glitches. It doesn't crash or freeze. It's reliable. 

What do I think about the scalability of the solution?

According to the documentation, it's quite scalable. That said, I haven't tried to expand it. We just use a single server and that's all we need right now. We don't have plans to increase usage.

We have three people who use the solution currently.

How are customer service and technical support?

We don't really use support. We tend to do everything on our own and solve any problems we have ourselves. We basically have just read the manuals and that's about it. 

How was the initial setup?

The initial setup is not complex or difficult. It's straightforward. 

The deployment process takes about two weeks. 

We had two people who handled the deployment process. They were an AWS DevOps person and a Pentaho expert.

What's my experience with pricing, setup cost, and licensing?

We do not pay any license costs. We use a free version of the product.

What other advice do I have?

I'm a consultant and an end-user.

I downloaded the latest version of the solution. I can't speak to the version number. 

I'd rate the solution at an eight out of ten.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Download our free Pentaho Data Integration and Analytics Report and get advice and tips from experienced pros sharing their opinions.
Updated: December 2024
Product Categories
Data Integration
Buyer's Guide
Download our free Pentaho Data Integration and Analytics Report and get advice and tips from experienced pros sharing their opinions.