What is a data mesh?
For decades, entrprises have just stored and stored data without really having any ways of cataloguing making it difficult to use the same data for a good reason. The issues include difficulty in discovering relevant data, not being able to democratize data and use it for business strategies and most importantly not being able to trust the data.
It is important to ensure data emerging from multiple and varied systems and applications are discoverable, trustworthy, secure, understandable and explorable to make the most out of it. This way, from piles of data, it is easier to figure out what is worthwhile and not. Domain driven data management and governance clubbed with a robust data fabric platform is the way to protect the value of data and to make it usable anytime for an organization’s strategies.
Traditional data management isn’t enough in today’s world. Big, strategic, data-driven digital transformations demand a more expansive, business-focused approach. And this new type of busines focused approach needs to tackle some significant challenges: ever-increasing data volumes; new kinds of data users; and the intense complexity of today’s application and data environments – be it operational data or analytical data. As per the data mesh concept, a monolythic data architecture of today’s world shouldbe replaced with a much more decentralised data management architecture. The current ways of working makes it difficult to supply data in an efficient way as data engineers who create data seldom know the business behind the creation or consumption of data. This makes it difficult for them to integrate right data with right quality or provide the relevant data to its consumers effectively.
To overcome these issues, Zhamak Dehghani’s data mesh concept provides a domain driven, collaborative, transparent, and agile way of consuming relevant and trustable data as data products —and it can do it at scale. The concept is built on four principles – Domain driven data ownership architecture, data as a product, self-service infracture as a platform, federated computational governance.
The platform architecture will help an organization embrace the multitude of data sources and driving value from the same in an agile fashion through domain ownership of data. The idea is to move data ownership to those closer to data creation or usage so that they can create relevant data products faster and accurately for consumers with the support of the right platform and governance. While the concept is to break the monolithic data platforms to smaller chunks created and owned by specialists, the four functional blocks of data mesh ensure agility in delivering quality data products.
The business case
Data mesh mindset and culture is a paradigm shift to leverage data’s value for better business outcome. When an organization starts to manage data keeping in mind the consumers and their needs, the result is data with better quality, faster availability and relevancy. This helps invert the pyramid from 80% time spent on data engineering and 20% on analysis to 20% time spent on data engineering.
To make this practical, data management needs to become more collaborative, transparent, and agile—and it has to do it at scale. This way, data mesh architecture can serve the whole enterprise with accurate, relevant and quality data at high speed with the provisioning of data as a product as per the needs of the customers (who are the consumers of data).
Impressive benefits:
When an enterprise move to data mesh concept and architecture to achieve the above stated benefits, the benefits are multifold such as:
- Absolute understanding of business outcomes that are impacted due to non availability of right data through a direct co-relation of data with business outcomes
- Total clarity on the value data brings in if managed correctly – helps remove the concept of data as a by product to data as a product
- Ability to pin the responsibility to correct stakeholders (non IT, who know the domain and business) to ensure data’s quality and relevance is intact
- Clarity on go-to persons and ability to collaborate faster for availability of data for business needs
- Ability to discover data that is available with ease – through context searches and more
- Ability to understand the flow of data and changes happening to data to increase trust
- Faster delivery of data for consumers with true Data as a Service model
Key trends that led to data mesh:
Data Mesh is still in early stages of adoption and market maturity as true implementations are very less. It is important to ensure all the 4 principles are met with while embracing data mesh to ensure it deliver value as promised. Data Mesh is not a silver bullet to address the complexity and expense a monolithic data architecture come with – these are complementary and can be part of a large data mesh.
Some of the technology trends that led to the emergence of data mesh as a solution include:
- There is an emerging need to enure data management is outcome focused
- Moving data from edge loations to data lakes are time consuming and expensive
- Costs of incorporating new operational data becomes more and more complex
- Data integration outages are rising and ability for observability is missing
- Data is the new oil for competitive edge and managing it well is the need of the hour
- Cloud lock-in is real and can become more costly
- Data lakes rarely succeed, takes time to deliver value and are only focused on analytics
- Rise of distributed data is forcing a more effective, efficient, and economic architecture
- Organizational silos worsen data-discovery, data-sharing issues making it difficult to dampen the time to market
Data mesh will provide greater autonomy and flexibility to data owners. The self-service data platform will address concerns of data consumers such as long waiting periods to get data, irrelevant and inaccurate data for analytics etc. through the ability to discover data faster, analyze the quality and data product lineage, data product monitoring and alerting and more.
Making data mesh operational
The idea behind data mesh architecture is to ensure data provisioning and usage is decentralised to support scale, quality and agility. Decentralization of responsibilities and decomposition of data is based on business domains within an enterprise. This is practical only when data ownership is with someone who know the data, how it is created by source applications or operational systems and its consumers needs. This way, data quality and relevance is ensured by means of correctness, completeness, timeliness, accuracy etc.
It is important to develop a good data governance charter and framework with proper people remit to make the domain-driven data ownership operational. It is to be noted that a change management strategy has to be operationalised with a communication strategy, different types of trainings, to ensure change is communicated and accepted. The owners should be able to drive the need for data to support business outcomes. Some of the characteristics of data owners are as given below.
It is equally important to have a platform that will support development and delivery of data products in an efficient way. It is important that the data is developed as per the requirements of the consumers. But it is equally important that the data is discoverable and trust worthy to ensure data mesh concept is adopted properly. For that to happen the self service platform needs to exhibit certain characteristics such as given below.
- Ability to develop data products effectively
- Ability to search and discover data easily
- Ability to understand the usability of data
- Quality, meaning, entities & attributes that constitute the data product, lineage, relations, impact and more
- Ability to collaborate with right data product owners on a need basis
- Ability to subscribe to data that is of interest
With the above self service data platform, when a domain data product owner deliver data products, it must be discoverable, secure, explorable, understandable and trustworthy. The product owner should also ensure it is of right quality and available on time. The domain product owners will work along with data product developers (who are not only pure data engineers but also have good knowledge about the domain) for building, maintaining and delivering data products. Each domain team may own one or many data products. Data mesh explains data product as an architectural quantum – meaning the smallest unit of architecture that can be independently deployed with high functional cohesion including all structural elements that are needed for its functioning.
The picture contains polyglot of data, supporting architecture to develop a data product as well as the code that ingests, transforms, enhance (with all metadata), and deliver (through data marketplace etc) data for its end uers. The products also makes them easy to discover and use though they are decomposed and decentralized through self serve platform. Interoperability between different data products within single domain or different domains is important to ensure agile discovery and delivery of data.
It is important to ensure the overall architecture supports the idea of decentralized data making available to consumers faster with good quality. The domain teams can leverage the overall platform to create domain data products and consumers can access the same through the self service platform. A good data fabric along with a good governance mechanism is required to ensure data can be easily integrated & transformed, interoperable, discoverable, and accessible for proper usage. A data fabric can help with the ability to provide context to data through metadata, increase trust through traceability & data quality, co-relate & consume through data virtualization and data marketplace, govern through stewardship. These are the backbones to ensure the success of the data mesh architecture.
Leveraging the data mesh strategy
When we think of implementing a new concept, we always think a 30-60-90 day plan. Data mesh should not be considered in this way. There is a lot of cultural change that is required to make data mesh operational in an organization, It is an organizational mindset, different type ofcollaboration, changed skills, architectural evolution and a completely different way of thinking about data value at scale. It requires a bit of change management and buy-in as the changes proposed by data mesh are new, breaking normal norms and need to be approached thoughtfully and implemented over a period of time considering a maturity model.
Develop a data strategy, change strategy and set up a global governance body as a first step. In the beginning, this can be a small cross-functional body that has representation (the selection criteria is described in above sections) from various functions in the enterprise such as legal, security, domains, infrastructure, and technology. This body will be responsible for developing policies & processes that are enterprise wide in nature and essential for the data products to interoperate. Some of these could be :
- Segragation of data product and their ownership with which domain
- Organization wide policies and regulations to ensure protection of personal data
- Plan for data products, entities, attributes that are global in nature so that data products can join or union data from multiple data products
- Data protection, security and usage restrictions
- Alignment towards any legal requirements and compliance
- Data classification policies
Global data governance should generally be responsible for developing and enforcing data policies, processes, standards and best practices that the local governance body is accountable for implementation and constant adherence.
A local governance body is central to data mesh concept as the whole paradigm is about decentralization of data through data product owned by those closer to data sources. This body should be responsible for defining any and all local data policies, enhancements to frameworks such as DQ / metadata, processes and is accountable for data product development and implementation in compliance to the guidelines set by global data governance team. This way, a federated data governance can be truly operational and aspects like data quality, data modeling, local access policies etc. are managed by the data product owner. This will also help an organization divide and conquer data product development in its true sense with data catalogs, domain specific data pipelines, data models considering all relevant and critical entities and attributes, aligning to local and global data policies and regulations and more. Interoperability of the data products are essential and the same is achieved by related data product take input from other data product and builds functionality on top of it to generate additional value.
An initial Data Mesh created will evolve over a period of time as this is never a 30-60-90 day program. As a part of evolution and greater acceptance from enterprise, new data products will be launched continuously and older ones will keep getting enhanced as per changing business needs. Data Mesh advocates de-centralization of governance but at the same time a larger strategy should be applied to bring bring balance between the benefits of data product autonomy and need for discoverability and usage of data products at mesh level. As in earlier times, instead of a hub and spoke model of global governance dictating which data products should be launched or what should be the data integration strategy, indicators such as data product usage statistics, user reviews and ratings, completeness and usability of data etc. can decide the life of the product. Other metrics that can come out of a centralised CDO operational dashboard such as data product downtime, quality, adherence to best practices, discoverability etc. can decide the stability of the data product and whether it needs to be dropped or enhanced.
Ensuring smooth data ops through a data ops construct is essential to make sure data mesh is used and enhanced properly. Data product owners should ensure to make use of the power of the data fabric platform to support the agility, discoverability, trustworthiness and interoperability of data products through enhancing the products with right metadata. This will help in multiple areas such as:
- Greater observability to quickly understand the issues in platform and remediate
- Faster curation of data products with automated suggestions for DQ rules, data classification etc.
- Greater trust through automated lineage capabilities
- Audits and compliance to global policies such as GDPR
All these principles are essential for a good implementation and adoption of Data Mesh in an enterprise. The degree of implementation can vary – it can be only to some local functions or enterprise wide as per the strategy of an organization. Though it can be assumed that the larger the mesh the greater the value it can produce from enterprise data, getting there is a paradigm shift in the way an organization think about data. Through proper change management, buy-in and data literacy, the concept of a larger mesh equals a complex mesh can be easily overcome.