Manager, Office of Enterprise Data Management at State of Arizona
Real User
2022-11-27T05:19:00Z
Nov 27, 2022
We haven't done very much classifying of assets because literally 99.9 percent of our data is public information. But the staff did help us set up some custom identifications to look for specific permit numbers. That is helpful because we want to know where they're showing up when they're not supposed to be there. They're supposed to be in certain fields and not others. But overall, we spend a lot of time tagging data and working on data quality rules. But to the extent that we've used it, the classifying functionality seems to work fine. We haven't gotten to the organized data governance part yet, but I think it's going to be instrumental. When we first started our enterprise data management program, we centered all of our activities around our first data sets that went into our data warehouse. People were not as excited about that. To me, having a source of truth is exciting and invaluable. The thing that the newer staff and younger people got excited about was the business intelligence tool that we used to display the data that's in the data warehouse. People are not going to be excited about data governance for the sake of data governance. But when you have a tool like Lumada Data Catalog, it gives you a place to start. If one section of our agency wants to change a column, we have a better chance, right away, of them understanding how it impacts other sections as well. If we can get our application developers and coders to use it and bring up the Galaxy View and the data lineage view, they will be able to show somebody, right off the bat, what the impact of their changes will be and explain it to them. Usually, they just say, "Well, that column's in a table that is really important..." but it's so abstract and vague. Business people don't have the time or inclination to understand relational databases. But if they can see the visual Galaxy View, it's going to go a long way toward helping our data governance, because in fact, it kind of stalled. Our whole enterprise data management program actually stalled because we had no metadata management or metadata repository. You're not going to be able to improve in the other categories if you're always going to be super weak in metadata management, because it affects data quality, data governance, platform and operations, et cetera. As storage is not an issue for us, we haven't used Data Catalog formally to look for duplicate data yet. But, in getting ready for our application modernization, we tagged all the tables and fields that have customer name or names and we've tagged reference tables. We'll have a table for "watershed" or "groundwater basin," and it shows up eight times, because it will be in the surface water database, the groundwater database, the wells database. We've already tagged the duplicate data, but we haven't used the automatic duplicate function yet.
What is Metadata Management? Metadata management is the administration of metadata (data that describes other data) across your organization. The management of metadata involves establishing processes and policies to ensure that information can be best accessed, integrated, shared, maintained, and analyzed across an organization.
We haven't done very much classifying of assets because literally 99.9 percent of our data is public information. But the staff did help us set up some custom identifications to look for specific permit numbers. That is helpful because we want to know where they're showing up when they're not supposed to be there. They're supposed to be in certain fields and not others. But overall, we spend a lot of time tagging data and working on data quality rules. But to the extent that we've used it, the classifying functionality seems to work fine. We haven't gotten to the organized data governance part yet, but I think it's going to be instrumental. When we first started our enterprise data management program, we centered all of our activities around our first data sets that went into our data warehouse. People were not as excited about that. To me, having a source of truth is exciting and invaluable. The thing that the newer staff and younger people got excited about was the business intelligence tool that we used to display the data that's in the data warehouse. People are not going to be excited about data governance for the sake of data governance. But when you have a tool like Lumada Data Catalog, it gives you a place to start. If one section of our agency wants to change a column, we have a better chance, right away, of them understanding how it impacts other sections as well. If we can get our application developers and coders to use it and bring up the Galaxy View and the data lineage view, they will be able to show somebody, right off the bat, what the impact of their changes will be and explain it to them. Usually, they just say, "Well, that column's in a table that is really important..." but it's so abstract and vague. Business people don't have the time or inclination to understand relational databases. But if they can see the visual Galaxy View, it's going to go a long way toward helping our data governance, because in fact, it kind of stalled. Our whole enterprise data management program actually stalled because we had no metadata management or metadata repository. You're not going to be able to improve in the other categories if you're always going to be super weak in metadata management, because it affects data quality, data governance, platform and operations, et cetera. As storage is not an issue for us, we haven't used Data Catalog formally to look for duplicate data yet. But, in getting ready for our application modernization, we tagged all the tables and fields that have customer name or names and we've tagged reference tables. We'll have a table for "watershed" or "groundwater basin," and it shows up eight times, because it will be in the surface water database, the groundwater database, the wells database. We've already tagged the duplicate data, but we haven't used the automatic duplicate function yet.