When assessing solutions, consider these important aspects:
Data Profiling
Data Cleansing
Data Integration
Data Governance
Scalability
A robust Data Quality solution should enable thorough data profiling, offering insights into anomalies and patterns within datasets. This feature aids in understanding data better and identifying discrepancies early. Reliable data cleansing is crucial, allowing the removal of duplicates and inaccuracies, ensuring the data is trustworthy for insights. Efficient data integration guarantees seamless merging of data from various sources, providing a holistic view and facilitating consistent data usage across applications.
Data Governance establishes policies and processes that manage data's availability, usability, integrity, and security. Effective governance helps maintain compliance with regulatory requirements. Scalability is vital for handling growing data volumes, ensuring that the solution remains effective as data quantity and complexity increase. A scalable solution reduces future infrastructure investments, ensuring long-term efficiency and performance.
Accuracy and reliability. Who can argue with that?
But the reality is that your business doesn't give a hoot whether we reach data purity if they don't believe it impacts them. I've seen a DQ "expert" lose all credibility for yelling data quality fire. So that's why I would caution throwing around high level terms that risks thinking we can merely try to make all our data reliable and accurate (whatever that is) then our job is done. Of course accuracy and reliability are important concepts (IF you can measure them) but there are always data inaccuracies - that's life sorry folks. Data is very tied to human frailties.
The important question is - where are unreliability and inaccuracy impacting your business and is it measurable in a trusted way? For example...what if 50% of my customer address state codes have garbage in them and lack accurate zip codes so they can't be contacted? You'd say that's awful. Is it? You'd tell your business about it right? We need to do something about this!! What if they come back and tell you none of those customers have ever bought anything from us and they probably never will. Are they still a big problem?
What I'm getting at is if you want to make a difference in data quality then triage data based on impact to the business and then find data quality issues that you can have high probability of affecting their bottom line and/or impact their decision making. That's not easy I know. It's a lot harder than just running some data profiles and saying look - its bad! And that's why data profiling is often where people stop. Oh and caution regarding the root cause of your data quality...it may not be a system but rather people. Data entry is most often compensated based on speed not on accuracy.
So yes accuracy and reliability are super important but so what? Can you measure it? Can you measure the impact to the business in a way they trust and care about? And even then can you influence the solution to the root cause? Does your organization have a data governance working group where these problems can be effectively addressed?
If the answer to any of these questions is "no" then you may be wasting your time worrying about it. If the answer is no and your title has data quality in it then it sucks to be you. Just get ready to run from the rocket launch pad when when you know the heat shields were designed based on bad data.
When assessing solutions, consider these important aspects:
A robust Data Quality solution should enable thorough data profiling, offering insights into anomalies and patterns within datasets. This feature aids in understanding data better and identifying discrepancies early. Reliable data cleansing is crucial, allowing the removal of duplicates and inaccuracies, ensuring the data is trustworthy for insights. Efficient data integration guarantees seamless merging of data from various sources, providing a holistic view and facilitating consistent data usage across applications.
Data Governance establishes policies and processes that manage data's availability, usability, integrity, and security. Effective governance helps maintain compliance with regulatory requirements. Scalability is vital for handling growing data volumes, ensuring that the solution remains effective as data quantity and complexity increase. A scalable solution reduces future infrastructure investments, ensuring long-term efficiency and performance.
Accuracy and reliability. Who can argue with that?
But the reality is that your business doesn't give a hoot whether we reach data purity if they don't believe it impacts them. I've seen a DQ "expert" lose all credibility for yelling data quality fire. So that's why I would caution throwing around high level terms that risks thinking we can merely try to make all our data reliable and accurate (whatever that is) then our job is done. Of course accuracy and reliability are important concepts (IF you can measure them) but there are always data inaccuracies - that's life sorry folks. Data is very tied to human frailties.
The important question is - where are unreliability and inaccuracy impacting your business and is it measurable in a trusted way? For example...what if 50% of my customer address state codes have garbage in them and lack accurate zip codes so they can't be contacted? You'd say that's awful. Is it? You'd tell your business about it right? We need to do something about this!! What if they come back and tell you none of those customers have ever bought anything from us and they probably never will. Are they still a big problem?
What I'm getting at is if you want to make a difference in data quality then triage data based on impact to the business and then find data quality issues that you can have high probability of affecting their bottom line and/or impact their decision making. That's not easy I know. It's a lot harder than just running some data profiles and saying look - its bad! And that's why data profiling is often where people stop. Oh and caution regarding the root cause of your data quality...it may not be a system but rather people. Data entry is most often compensated based on speed not on accuracy.
So yes accuracy and reliability are super important but so what? Can you measure it? Can you measure the impact to the business in a way they trust and care about? And even then can you influence the solution to the root cause? Does your organization have a data governance working group where these problems can be effectively addressed?
If the answer to any of these questions is "no" then you may be wasting your time worrying about it. If the answer is no and your title has data quality in it then it sucks to be you. Just get ready to run from the rocket launch pad when when you know the heat shields were designed based on bad data.
Accuracy and Reliability
Accuracy