One of the selling points of the IBM InfoSphere data management suite is that its modules share the “XMETA repository” (a common metadata repository). This is useful in supporting two key data governance functions: data lineage reporting and the linking of metadata assets to business term definitions. Its 2005 acquisition of Ascential Software Corporation provided IBM with most of the modules it would later use in InfoSphere. IBM has invested considerable effort in integrating those modules with data management products acquired elsewhere as well as ones they developed. Their integration has been fairly successful, but in some cases they were forced to design “bridges” between modules which have separate repositories (e.g., Metadata Workbench and Information Analyzer). Therefore, it is important to understand the structure of InfoSphere’s various repositories when designing your data management platform.
The following list of modules is an example of only one of the possible ways to combine IBM products to support a data governance program.
Metadata Workbench can be used by administrators to maintain the XMETA repository. In addition, it also functions as the user interface for displaying data lineage (after the repository is populated with metadata assets). With their release of Infosphere 8.7, IBM introduced Metadata Asset Manager (IMAM), a new suite function. This was a considerable improvement over the original import/export functions. After some initial implementation difficulties, IMAM has proved useful in implementation of the Metadata SDLC. Metadata Workbench also allows the linking of metadata assets (e.g., column names) to business term definitions (see Business Glossary).
Business Glossary uses the XMETA repository. Note that there are other IBM glossary products (particularly in the Rational suite) that are not integrated with the InfoSphere product. It is also important to keep in mind that the repository is empty when the “Information Server” (the InfoSphere platform) is built. The glossary is structured in a user-defined searchable hierarchy. A formal workflow can be constructed within the product to secure the maintenance of both hierarchies and the definitions to authorized users. Although Information Server is LDAP compliant, some modules, especially Business Glossary, may require additional user setup to implement some user roles.
Business Glossary Anywhere is a Windows workstation client that first requires the end user to supply an Information Server user ID when starting the client on the workstation. Using key sequences or a simple right mouse click, the user invokes the client to begin a search of the Business Glossary by entering criteria into a search text box or by selecting a word or phrase from a document displayed on the workstation. Almost any document type can be employed (HTML, text, BI reports and dashboards, etc.).
Data Architect is IBM’s data modeling tool. Models at all levels (conceptual, logical, or physical) contain metadata (and, indeed, are themselves metadata assets) that are stored in the XMETA repository. Whenever a user’s SDLC calls for change control at the model level, the XMETA repository will reflect updates to the production models. If change control is handled in the database directly, Data Architect allows the user to reverse engineer tables to update the XMETA repository.
Datastage is the IBM ETL system. All Datastage jobs contain metadata assets that will be stored in the XMETA repository. If Datastage jobs are used to transport data from an OLTP database to a data warehouse, then that transport will appear in lineage reports. In an Inmon architecture, then Datastage should also be used to transport the data from the enterprise data warehouse to the data marts. If external data is brought on board to the OLTP using Datastage (possibly as a web service call), the lineage reporting will also include that transport.
Qualitystage’s purpose is to perform data cleansing operations. As the functionality of Qualitystage jobs are built on the Datastage model, the XMETA repository also serves as its metadata storage. An optional module, it should be implemented either by a data governance directive or to serve as part of an MDM program to be of use.
Used for data profiling, Information Analyzer does not use XMETA. Instead, Information Analyzer stores information in “IADB”, its own repository. It enables information from IADB to be imported into Xmeta. Like Qualitystage, Information Analyzer is optional, and, ideally, it also requires either a data governance directive or to serve as part of an MDM program.
Cognos is IBM’s business intelligence tool. We can bridge between its repository and the InfoSphere Business Glossary, allowing it to complete the lineage reporting from the database through the Cognos abstraction layer, and to the report asset.
Reporting capabilities from data management products have been traditionally dreadful and InfoSphere is really no exception. While recent versions have made improvements, especially in data lineage reporting, there is vast room for improvement. I have suggested to IBM that I would like to see the schemas published for both XMETA and IADB. In an organization that also owns (or is buying) Cognos, I would further suggest that it should be possible to define a data management datamart which would leave the actual report design (or other BI artifact) to the discretion of the user’s BI development staff. This would relieve IBM from trying to improve the reporting capability and get back to integrating the products more fully.
*Disclosure: I am a real user, and this review is based on my own experience and opinions.