I am a solution architect for a tech vendor with 10,000+ employees.
I am looking for information regarding Pentaho for data integration and ETL needs for the following criteria:
Development effort- The development effort, time and complexity.
Maintainability- Is it less maintainable?
Error Handling- Does it only have a single log file or does it have a log and error port in every transform?
What kind of errors can it handle?
Various teams needed- Does it need a separate administration team or will a Unix or NT Admin suffice?
File Structure- Is it only able to read record with a single type of delimiter?
Data Integration Capability - ODI boasts a comparatively lesser range of data integration products and abilities which includes many related functions such as profiling and data quality. Also, if it offers these capabilities are they more mainstream in nature?
Market Segments- serves medium to large scale companies?
Debugging: Does it offer easy debugging? Can you place watchers on required places and intermediate data will be saved in temporary files for easy viewing? Is it a complex debugging process through a debugger?
Company Strategy: Can you download a scaled-down free version of their software and is there free documentation available on the internet?
Go live rate- Does it have high “GO Live” success? Do issues come up during the deployment?
Scalability- Is there any issue with stability? If yes then what is the issue and what is the impact?
Which kind of scalability is supported- horizontal, vertical?
Performance- Can it support a high volume of data movement, transformation, and integration (ETL operations)?
Parallelism - mapping level parallelism, session-level parallelism, does it support multiple parallel sources and multiple target data loads?
Heterogeneous system- Does it integrate data from various heterogeneous systems like multiple varieties of databases (SQL Server, Oracle, DB2, etc), files (XML, XLS, CSV, text, etc)?
Can targets be any type of DB, file, etc.?
Big Data Support- Can it be integrated and used for big data?
Cloud solution- Is it available for both- on cloud and on-premises platforms?
Pricing- Is it freeware, open-source? Does it come in basic, standard and enterprise editions?
Repository- Does it offer repositories? Are those repositories for metadata?
Push down mechanism- Does it have pushdown optimization concepts where it can generate SQL statements from the workflow/mapping which can be directly executed on the database?
Is it an ETL or ELT tool?
Job scheduling- does it come with a builtin scheduler?
Version controlling- Does it offer version control? If yes then is it tightly controlled or moderate?
Tool Bugs- Are there any known tool bugs and any issues due to those bugs?
Pentaho is an open source business intelligence company that provides a wide range of tools to help their customers better manage their businesses. These tools include data integration software, mining tools, dashboard applications, online analytical processing options, and more.
Pentaho has two product categories: There is the standard enterprise version. This is the product that comes directly from Pentaho itself with all of the benefits, features,...