Hi community,
I'm working as a Data Warehouse Analyst at a Financial Services company with 10,001+ employees.
Currently, I am looking to evaluate tools such as Informatica Test Data Management (TDM), Collibra Catalog, or others.
The tool should have the ability to store reference data without having to keep mapping documents. Also, it should be able to check all data as well as specific subsets.
Which of those 2 tools (or an alternative one) would you suggest? Please explain why.
Thanks.
Depending on your requirement you would want to select the right tool to achieve your requirement based on the suggestions below.
1. Data Discovery and Obfuscation with Classification you would need a tool that is capable of handling referential integrity ensuring that when a data is Identified as PII (Personally Identifiable Information) is obfuscated correctly e.g. If Jim is Obfuscated to Jack then any relatable references of Jim is always masked as Jack. Cross Reference mapping is a key in Test Data Management that deals with taking data from Production and making it available in Non-Prod sanitizing the data to have meaningful information but not identifiable information. Once the data rules for sanitization are defined applying the data rules in the most efficient way will be another criteria to consider. You would not want your data not to be available during the process of obfuscation or have a DB performance issue. You should also consider capabilities like continuality in case your obfuscation process fails midway you would want to continue from the last updated record
(Informatica Test Data Manager for relational database only, Broadcom Test Data Manager for relational database only and some non-relational databases and flat files, Collibra for structured and some unstructured data source)
2. How often will my data be refreshed, If it is very often then you would want to subset the data to only include changes in your refresh as the obfuscation process is resource-intensive and could lead to performance degradation on a continuous refresh of a large database.
(Informatica Test Data Manager for relational database only, Broadcom Test Data Manager for relational database only and some non-relational databases and flat files, Collibra for structured and some unstructured data source)
3. How often does my database need restore for testing and how often would you get conflict due to multiple tests running on the same environment. You would want to consider virtualizing the data like Virtual Machine to prevent regular restore and having a quick checkpoint to roll to once testing is completed and also giving the capability to the tester to have their own test environment (Broadcom Test Data Manager for a relational database).
4. Is data generation your requirement, do you need to add missing data or generate data for a newly developed system. Would you want to generate data that meet negative and positive test case scenarios? E.g. Getting a valid Social Security number that is usable but has no actual entity to be identified to, Generating a Visa card dummy number that will pass visa validation but is not a valid card
(Informatica Test Data Manager for relational database only, Broadcom Test Data Manager for relational database only and some non-relational databases and flat files)
5. Data Discovery would also be a good to have feature, like the capability to scan data and column in your data source and identify information that could be PII data - (Informatica Test Data Manager for relational database only, Broadcom Test Data Manager for relational database only and some non-relation databases, Collibra for structured and some unstructured data source)
6. Classification of the databased on discovery and risk of the data in the data source
(Informatica Test Data Manager support, Broadcom Test Data Manager support, Collibra supports)
7. How diverse is my data sources. does my data reside on the same database or multiple databases, are the relationship that exists across databases and are all the databases supported by the tool.
8. Do data only reside in the database or in a Flat file, will the tool be capable of providing test data management to flat-file like CSV, XML and so on.
With just those few requirements, I'd opt for a simpler tool such as the one by dScribe from a cataloging point of view.
If validation of the actual data is a requirement then the most complete offering today is Collibra's.