Data Quality Management
Data governance encompasses the program management required to manage data consumer expectations and requirements, along with collaborative semantic metadata management. However, the operational nexus is the integration of data quality rules into the business process and application development life cycle. Directly embedding data quality controls into the data production workflows reduces the continual chore of downstream parsing, standardization and cleansing. These controls also alert data stewards to potential issues long before they lead to irreversible business impacts.
Engaging business data consumers and soliciting their requirements allows data practitioners to translate requirements into specific data quality rules. Data controls can be configured with rules and fully incorporated into business applications. Data governance procedures guide data stewards through the workflow tasks for addressing emerging data quality issues. Eliminating the root causes for introducing flawed data not only supports the master data management initiative, it also improves the overall quality of enterprise data. Data quality management incorporates tools and techniques for:
• Data quality rules and standards. Providing templates for capturing, managing and deploying data quality rules – and the standards to which the data sets and applications must conform – establishes quantifiable measures for reporting quality levels. Since the rules are derived from data consumer expectations, the measures provide relevant feedback as to data usability.
• Data quality controls. Directly integrating data quality controls as part of the application development process means that data quality is “baked in” to the application infrastructure. Enabling rule-based data validation ratchets data quality out of downstream reactive mode and helps data practitioners address issues within the context of the business application.
• Monitoring, measurement and reporting. A direct benefit of data quality rules, standards and controls is the ability to continuously inspect and monitor data sets and data streams for any recognizable issues, and to alert the right set of people when a flaw is detected.
• Data quality incident management and remediation. One of the most effective techniques for improving data quality is instituting a framework for reporting, logging and tracking the status of data quality issues within the organization. Providing a centrally managed repository with integrated workflow processes and escalation means that issues are not ignored. Instead, issues are evaluated, investigated and resolved either by addressing the cause or determining other changes to obviate the issue. The visibility into the point of failure (or introduction of a data error) coupled with the details of the data quality rules that were violated help the data steward research the root cause and develop a strategy for remediation.
While one of the proposed benefits of MDM is improved data quality, in reality it’s the other way around: To ensure a quality MDM deployment, establish best practices for proactive data quality assurance.
Integrating Identity Management into the Business Process Model
The previous phases – oversight, understanding and control – lay the groundwork of a necessary capability for MDM: entity identification and identity resolution. The increased inclusion of data sets from a variety of internal and external sources implies the increased variation of representations of master data entities such as customer, product, vendor or employee. As a result, organizations need high-quality, precise and accurate methods for parsing entity data and linking similar entity instances together.
Similarity scoring, algorithms for identity resolution and record linkage are mature techniques that have been refined over the years and are necessary for any MDM implementation. But the matching and linking techniques for identity resolution are just one part of the solution. When unique identification becomes part and parcel of the business process, team members become aware of how their commitment to maintaining high-quality master data adds value across the organization. Identity resolution methods need to be fully incorporated into the business processes that touch master entity data, implying the need for:
• Enumerating the master data domains. It may seem obvious that customer and product are master data domains, but each organization – even within the same industry – may have numerous data domains that could be presumed to be “mastered.” Entity concepts that are used and shared by numerous organizations are candidate master domains. Use the data governance framework to work with representatives from across the corporation to agree on the master data domains.
• Documenting business process models and workflows. Every business process must touch at least one master data entity. For an MDM program, it’s critical to understand the flow of business processes – and how those processes are mapped to specific applications. The organization must also know how to determine which applications touch master data entities.
• CRUD (create, read, update, delete) characteristics and process touch points. Effective use of master data cuts horizontally across different business functions. Understanding how business processes create, read or update master data entity instances helps the data practitioner delineate expectations for key criteria for managing master data (such as consistency, currency and synchronization).
• Data access services. Facilitating the delivery of unobstructed access to a consistent representation of shared information means standardizing the methods for access. Standard access methods are especially important when master data repositories are used as transaction hubs requiring the corresponding synchronization and transaction semantics. This suggests the need to develop a layer of master data services that can be coupled with existing strategies for enterprise data buses or data federation and virtualization fabrics.
• “Master entity-aware” system development. If one of the root causes for the inadvertent replication of master data stems from siloed application development, the remedy is to ensure that developers use master data services as part of the system development life cycle. Couple the delivery of master data services with the proper training and oversight of application design and development.
The methods used for unique identification are necessary but not sufficient for MDM success. Having identified the business applications that touch master data entities is a prelude to exploring how the related business processes can be improved through greater visibility into the master data domains.