What is our primary use case?
The primary purpose is to catalog all the different data sources. The idea is to get insight into what is available and, more importantly, to document and better understand the data quickly and easily. You're not doing any manual work, you're just scanning sources, which means you can automate it. It automates the majority of the documenting process.
How has it helped my organization?
The most important benefit is its data documentation and the data catalog. It saves a lot of time compared to doing those things manually. Normally, when you want to describe your sources and get an overview of what is available, it takes a lot of time for data engineers and other people in the company to document the data. That whole issue is eliminated by using Purview.
Also, the fact that you can easily view your metadata helps with data exploration. A lot of the time, there are so many sources at a company that at some point, most people don't even know what is available anymore. A really key feature is that not just one person but lots of people can access it for the same price, to do data exploration. They can all see what is available and decide what they want to see in a company report, for example.
The beauty of Purview is that it's all about a central location where everyone goes. I wouldn't recommend creating multiple Purview instances, although you might have one for production and non-production. But, ideally, you would just have one Purview for your entire organization and then provide access to multiple people to make use of it.
On the documenting side, in particular, it saves a lot of time, and time is money, especially when you are dealing with people entering data and information into Excel. That can be replaced by Purview and that saves a lot of time. Purview also gives you information that you can act upon. Instead of finding out too late, you can act earlier, and save money in that sense.
What is most valuable?
What I like the most about Purview is the fact that you can really easily connect to data sources and retrieve the metadata in a batch manner. Instead of having to manually write down which tables and columns exist and then describe them, you can do that process in one go, by simply connecting to a source. That's a huge time-saver and a great benefit of Purview.
The solution takes into account critical data compliance regulations from around the world and that is one of the most important aspects of Purview. New laws are being enforced for data compliance and a lot of companies have a great interest in this feature of Purview. I think Microsoft is going to be focusing on that for the next couple of years to help organizations improve on data compliance.
In terms of reducing time-to-action, if you set up a clever rule that gets applied to your scan—it would just take some time to create that rule—in theory, whenever you are scanning your data you could identify something that is going on and act upon it. But I haven't seen that in practice yet.
What needs improvement?
The fact that Purview delivers data protection across multiple platforms, including AWS and GCP, is really important, but I feel the tool can mature further in that area. You can set up rules and scan your data and then you can figure out whether your data is secure and compliant, but feel that Microsoft could improve on this and add more features to the tool. I think they will do so over time. The solution has only been generally available since last year, so it's still quite early in terms of maturity. The multiple platforms feature is very important and there is potential there.
A bit of a downside is that although you can explore the data, that creates a great interest in data lineage or the data flow. How does it go from a source to a platform to a Power BI report, for example? It is possible, to some extent, to see that with Purview, but the lineage feature requires some manual work on the development side or more work from Microsoft to improve on it.
The data lineage is effective and useful when you are using all Microsoft products, but as soon as there's any complexity or you have a different tool in between, like Databricks for data transformations in your platform, for example, the lineage isn't going to be added in Purview because there is no connection to it. On the lineage side, a lot more can be done, but there is a lot of potential.
An additional feature I would like to see is in the following scenario. Suppose you have your sources scanned and you have all the tables listed in Purview. Right now, to update and label them, or to group them, would take a lot of time because you have to manually click on the assets and the tables that you have. But given that a database can have hundreds of tables, it would be helpful if you could update the assets in batch and, possibly, multi-select them. That would be a nice addition.
For how long have I used the solution?
I have been involved with Microsoft Purview from the private preview stage, which was about two or three years. At that stage, it was only being shared with certain companies and nothing could be shared externally. In that phase, I got to share what I learned from the tool with Microsoft.
I haven't used it all the time since then, but more recently, I got to work with it for a few months so I got to see the latest update and changes that were made.
What do I think about the stability of the solution?
It is quite stable in terms of the scans running and not failing. It's not going to be slow or not function when you do an action inside of Purview. The stability is great.
What do I think about the scalability of the solution?
It's really scalable, just like most things in Azure. You can add to it but it gets more expensive. You can add as many sources as you want, and the scanning of sources goes quite quickly, even for really big databases. The reason is that you're not copying any actual data, you're only getting the metadata, meaning a description of the tables and the schema, et cetera.
How are customer service and support?
I haven't had to use Microsoft's technical support for Purview.
From my experience with other Microsoft Azure tools, the support is not bad, but it might take some time. Once you get someone working on it, an issue always gets resolved, but it can take a bit of time to get the right person involved to help you out.
Which solution did I use previously and why did I switch?
For documenting data I have used Excel. I've seen huge Excel files with lots of data descriptions, and they took a lot of time to create.
Also, on the data quality side of things, I have used an Azure data platform: an SQL database and a Power BI report. For example, if you're scanning data and you apply a rule to check if a column is empty so that you can classify it as "empty column," that would be a data quality rule. Instead of using Purview, I have used Azure.
How was the initial setup?
It's very straightforward. There are not that many fields to fill in initially. Connecting sources, the first step, didn't take a long time. You really quickly get to see things, especially when it comes to Azure sources. It's all integrated so you can connect really easily. You just need to have authentication rights assigned. So connecting is quite fast.
The deployment is all-cloud. It's all Azure, which makes it really easy to deploy the tool really quickly. And if you have other data in the cloud, you can really easily connect to it.
The second part that takes a bit longer is defining the data, just like you would normally describe your tables and your columns and all your data definitions in Excel. In Purview, that also takes a bit of time. You have to find the way to describe it most easily. You can use the rules while scanning your data and automatically label or classify the data. But creating those rules takes a bit of time: How are you going to scan the data and what rules do you apply?
Getting the resources going just takes a day or two. But to connect to them and make things functionally available takes more time.
It's a one-man job. Even for connecting to resources, all you need is an admin who can grant you all the rights that you need for those sources and you can really easily scan them. The part where you need more people is on the business side because you need to describe, understand, and classify your data. That takes a lot more people because one person might know something about the customer database and a different person might know something about the finance database.
What was our ROI?
There is an investment of time involved, but once you set up those rules and you have the sources to scan, it automatically checks your data. It takes time to set it all up, but over a longer period of time, you will actually save time and see a return on investment. How fast that happens depends on your organization and how many data sources you have, as well as on how many people are using Purview and how efficiently.
What's my experience with pricing, setup cost, and licensing?
You pay a minimum amount every month for the data map. You scan your sources and the metadata gets saved and then you pay for what is stored, which makes sense. However because there is a minimum amount, in the beginning, you might pay for more than you are using. That's something that some of my clients didn't like. That's why I say it's quite pricey; you're always paying a certain amount.
It would be nice if it went to entirely pay-per-use. For example, on Azure, when you have storage accounts, you pay for exactly how much you store. That would be nice to see in Purview as well. And while it's pay-per-use, you pay for features as well. For example, you pay for the cataloging part, including describing your data and adding labels and classifications. I would like to see a standard price and exact pay-per-use.
I understand, in practice, that might not happen, but the pricing may be a bit overwhelming for some clients. They will say, "Hey, I'm already paying this much and now something else comes with another cost? Why is that?" It raises questions.
Which other solutions did I evaluate?
I'm working at a company that is a Microsoft partner, so it's all Microsoft-first.
We did do a quick analysis for a few other companies and there is some competition out there, but the other solutions are quite expensive. They are enterprise tools that are a bit more mature but the license costs $100,000 for some of them. Purview is pay-per-use and a lot of companies are interested in that. It's still quite expensive compared to most Azure components, but compared to the alternatives it costs less. That may be because it's not that advanced yet.
What other advice do I have?
My advice is to involve people from the business side who can describe data and describe business terms. That's what is most important. Otherwise, it's just going to stay a technical implementation and it won't be used, which would be a huge waste. From the start, involve the people with a mandate who can actually start using it in the future.
Regarding its data connector platform for ingestion from non-Microsoft data sources, there are so many sources in the world that they have a ways to go there, but I do feel, especially in the last year, that the solution has grown a lot in that area. All the big, and most-used data sources, like Amazon, SAP, and many other sources, have been added, which is a great step. But if you work with sources that are more unique, the kind that are not used by many other companies, those are not available and you would have to write code in Purview for them. You can use the API that is available and you could insert metadata and lineage information into Purview, but that is a manual process. You would have to develop that for specific sources.
Purview's natively integrated compliance across Azure, Dynamics 365, and Office 365 is also quite important, but I haven't worked on that myself.
In my experience, Purview hasn't come far enough yet to help us reduce the number of solutions that interact with each other. We use Purview right next to all the other tools, which is okay. It takes a lot of time for a company to adapt to using Purview. You can scan data quite easily and figure out how to apply rules and classify and document your data, but you still need people to adapt and make use of it. I haven't really seen that last part very much in practice yet. It takes a bit of change management to get people to make use of it properly. As a result, it hasn't replaced tools yet for me.
Purview doesn't enable you to show compliance in real-time, but you can schedule how often you scan your sources. When the sources are scanned and added to Purview, they become visible and you can see if you're compliant or not, but that's not real-time. You can schedule scans daily, for example, but then you have daily data sets rather than real-time data.
Overall, the potential for this solution is really large. Data management is extremely important and Microsoft is investing very heavily in Purview. Right now, it's not quite there yet.
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner