What is our primary use case?
TDM is something people do all the time. You cannot say it is something you're going to do from scratch. For every client, there is a different scenario. There are a lot of use cases. But a couple of use cases are common everywhere. One of them is the data when it is not there in production. How do you create that data? Synthetic data creation is one use case challenge that is common across the board.
In addition, the people who do the testing are not very conversant with the back end or with the different types of databases, mainframes, etc. And most of the time they don't write very good SQL to be able to find the data they are going to do their testing with. So data mining is a major concern in most places.
The use cases are diverse. You cannot point to many common things and say that this will work or this will not. Every place, even though it's a TDM scenario, is different. Some places have very good documentation, so you can directly start with extraction, masking, and loading. But for most places that is not possible because the documentation is not there. There are multiple use cases. You cannot say that one size fits all.
In the testing cycle, when there is a need for test data management tools, we use CA TDM to put up the feed.
How has it helped my organization?
If you take DevOps as an example, suppose development has happened and the binary code has been deployed to a certain server. To do continuous testing or continuous integration, you need the test data. CA TDM has a feature where it can generate the required data beforehand and keep it with your test cases however you need it. If you are using JIRA it will put the test data in JIRA. If you are using ALM it will give data to HPE ALM. Once you are running your test cases in an automated way, the data is already there.
And the data provided will work for all the results. For example, if you want to do a set of automated scenarios, it will work with that. If you want it to work with various regression cycles, it will work with that. And the same data, or maybe a different set of data that you provide, will work with the unit cycles as well. CA has the ability to provide all the data on-demand as well as on the fly.
What is most valuable?
The combination of extract, mask, and load, along with synthetic data, is what is generally needed by any of our clients, and CA TDM has really good compatibility in both of these areas.
CA is one of the few tool suites that has end-to-end features. Whatever role you are playing, whatever personality you are trying to address, it has that feature. For example, CA Service Virtualization goes hand-in-hand with TDM. In addition, TDM has automation. CA has most features that complement the whole testing cycle.
CA TDM has open APIs. If we are going to use a set of Excel data and pull out the feed and we want to help the Service Virtualization by providing a set of dynamic responses to the request that that service layer is getting, how do we do that? We can use the API layer at the moment the whole process stabilizes, after three months or so. In other tools, that takes longer. This open API capability is good in CA TDM.
What needs improvement?
There are multiple things which can be improved in CA TDM. It has a feature called TDM Portal, where testers can find test data by themselves, based on multiple models. They can reserve the data so that it belongs to one group or individual. Obviously, that data is not available to anybody else. Without this tool, if somebody goes through the back end, via SQL and pulls that data, you can't do anything. But through the Portal, if somebody reserves the data, it's his. This feature is for one environment. But if a different group of testers wanted that data for a different environment, they can't use it via CA TDM. That feature doesn't exist. You have to build a portal or you have to bridge the two environments. That is a big challenge.
For how long have I used the solution?
What do I think about the scalability of the solution?
You can envision TDM happening at three layers. One is the application layer, the second is a cluster layer, and the third is an end-to-end layer. The data required for the first level and the second level are pretty different. You can't use first-level data in the second level. And the data required for the third, for end-to-end testing, is very different from the first two layers.
So when we look at scalability, we have to see how we are creating the "journey" from one layer to another. For example, if we are working in the customer area and then we jump to payments, we have to see what the common things are that we can scale and what areas we have not tested and address them.
How was the initial setup?
The initial setup is always complex. I have been working in testing environments for the last 10 or eleven years. From what I have seen, most companies lack the basic building blocks for testing.
Suppose I have a system, and that system gives data to system Y, and Y gives data to system Z. Nobody has a clue how that data gets there for testing, because that end-to-end testing has never happened. We cannot give someone data which will be rejected from system Z. We have to give him data which will pass across all the systems. And that means we have to understand the mapping file behind it. However, the mapping file is often not there so we have to create it.
We have to talk about the various models, are they logical or physical? Somebody may have created a set of logical data models 20 years back but it is not usable now. We have to work with the tool to create that set of data.
We also have to consider the scheme of values. If it's IMS, that is different from RDBMS. We have to find out what segment has more data, which segment is completing and which segment is giving data to systems. When we talk to the people who are working on that data set, one that is 20 years old or 30 years old, 90 percent of the time they don't have a clue. They are working with various tools but they don't have a clue how it is happening.
So there are always multiple challenges at the start. But then we do due diligence for six or eight weeks and it clears up all the cobwebs: What is there, what is not there, and the roadmap. That puts a foot forward so we cna say, "Okay, this is how we should move and this is what we should be able to achieve in a given timeline."
The initial deployment will take a minimum of three to four weeks.
The second step is a PoC or a pilot to run with a set of use cases.
Which other solutions did I evaluate?
Apart from CA TDM, I've used IBM InfoSphere Optim, which was number-one TDM tool for quite some time, and I've used Delphix. Now, a couple of more tools have come into the market, like K2View. At one point in time, about two years back, CA TDM was only tool that could do synthetic data.
CA TDM and Optim have different ways of working. CA TDM vs Optim has the major advantage of synthetic data creation. No tool was able to do that. Only in the last two years has IBM Optim come up with synthetic data capabilities, but what they are doing is creating a superset. If you have sample data, it will create a superset of that data. That is not the case with CA, as well as other tools.
There are multiple sites that also create synthetic data, but the major challenge comes into the play once you need to put that data back into the database.
What other advice do I have?
There are, let's say, five market-standard tools you can choose from. If you choose CA TDM, you need to bring out all your questions for your PoC journey. You have four weeks to get answers to whatever questions you have. There is a set of experts at CA and partners have expertise as well. Both will be able to answer your questions.
Next, you need to supply a roadmap. For example, "I need X, Y, and Z to be tackled first." And the roadmap that comes out of the due diligence needs to be followed word-for-word. So proper planning is essential.
There are three teams which are at the base of your TDM journey. One team is a central data commandment team, one is a federated team, and the third is for creating small tools that you might require at that point in time. To start, you need three to four people. But we have gotten into all types of data: Big Data, RPA, performance; etc. Wherever data is needed, our team is providing the data. In a bank, for example, where I did two rounds of due diligence, one lasting eight weeks and the other, three years later, lasting six weeks, we even implemented bots. When we started there the team was 50. Even though we automated the whole thing, more than what anyone might have even imagined, the team is still 40-plus.
Disclosure: My company has a business relationship with this vendor other than being a customer: Preferred Partner.