Manager of Data Analytics at Tata Consultancy Services
Real User
2014-08-14T07:14:00Z
Aug 14, 2014
Thanks Gary for the details I missed to include. Indeed the cost, skill and maintenance of licensed tool is the make-or-break factor while deciding which tool to opt for, often realized through experience over period of time.
Search for a product comparison in Data Integration
The assumption that you are referring to batch/ETL data integration rather than process mentioned by Abhishek said is critical. The primary difference between ETL verses IAI being in the areas of expected latency, data format, and data verses process transformation. I say that because there is overlapping functionality between those two categories of integration tools while being fundamentally different in design and expectations.
Other tools which are potentials are Ab Initio and Microsoft SqlServer Integration Services. You can't go wrong with InfoSphere or Informatica if money is no object and willing to make the investment.
While all of the above tools will share some level of features that Abhishek mentioned (which was a great summary btw), the devil as they say is in the details. Usability of logging, scalability, richness and usability of metadata, user friendliness/level of support, scheduling - these are often very different and you won't easily tell significant differences until you actually use them for awhile. The differences will show up in how much customization developers have to do to make them perform and are supportable in the real world.
Manager of Data Analytics at Tata Consultancy Services
Real User
2014-08-13T12:23:47Z
Aug 13, 2014
Data Integration systems, I would assume, are not referring to messaging and EAI here. For non-messaging or non-EAI Data Integration systems, I would look for these features:
01. Range of built-in components available to transform data.
02. Degree of customization possible at transformation level, process level and group of processes (batch) level.
03. Exception Handling mechanism supported, either through: built-in, configuration or custom.
04. Scheduling and Reporting of status of process(es).
05. Notification, Alerts and Logging, as required for specific process(es).
06. Variety of Sources and Targets which can be used for design of ETL process(es).
07. Architecture scalability, when: processing large (and very large) data volumes, failure recovery, high availability.
08. Concurrency and Stability of the system during BCP and/ or Failover recovery.
09. Available and accessible Metadata Repository.
10. Client software being available in various compatible OS (optional).
Few of the market leading solutions which meet most of the requirements are:
1. Informatica PowerCenter (licensed).
2. IBM InfoSphere DataStage (licensed)..
3. SAP BODI (licensed).
4. Talend DI (OpenSource).
5. Pentaho Kettle (OpenSource).
6. Oracle DI/ Oracle WB (licensed).
What is data integration? Data integration is the process of combining data that resides in multiple sources into one unified set. This is done for analytical uses as well as for operational uses.
Thanks Gary for the details I missed to include. Indeed the cost, skill and maintenance of licensed tool is the make-or-break factor while deciding which tool to opt for, often realized through experience over period of time.
The assumption that you are referring to batch/ETL data integration rather than process mentioned by Abhishek said is critical. The primary difference between ETL verses IAI being in the areas of expected latency, data format, and data verses process transformation. I say that because there is overlapping functionality between those two categories of integration tools while being fundamentally different in design and expectations.
Other tools which are potentials are Ab Initio and Microsoft SqlServer Integration Services. You can't go wrong with InfoSphere or Informatica if money is no object and willing to make the investment.
While all of the above tools will share some level of features that Abhishek mentioned (which was a great summary btw), the devil as they say is in the details. Usability of logging, scalability, richness and usability of metadata, user friendliness/level of support, scheduling - these are often very different and you won't easily tell significant differences until you actually use them for awhile. The differences will show up in how much customization developers have to do to make them perform and are supportable in the real world.
Data Integration systems, I would assume, are not referring to messaging and EAI here. For non-messaging or non-EAI Data Integration systems, I would look for these features:
01. Range of built-in components available to transform data.
02. Degree of customization possible at transformation level, process level and group of processes (batch) level.
03. Exception Handling mechanism supported, either through: built-in, configuration or custom.
04. Scheduling and Reporting of status of process(es).
05. Notification, Alerts and Logging, as required for specific process(es).
06. Variety of Sources and Targets which can be used for design of ETL process(es).
07. Architecture scalability, when: processing large (and very large) data volumes, failure recovery, high availability.
08. Concurrency and Stability of the system during BCP and/ or Failover recovery.
09. Available and accessible Metadata Repository.
10. Client software being available in various compatible OS (optional).
Few of the market leading solutions which meet most of the requirements are:
1. Informatica PowerCenter (licensed).
2. IBM InfoSphere DataStage (licensed)..
3. SAP BODI (licensed).
4. Talend DI (OpenSource).
5. Pentaho Kettle (OpenSource).
6. Oracle DI/ Oracle WB (licensed).