VANTAGE POINT: Utilizing Social Media to Improve Data Governance
By Marty Moseley, Chief Technology Officer, Initiate Systems
When used strategically, Web 2.0 and other collaboration technologies can easily and cost-effectively streamline the data governance process and move the productivity dial.
Companies first have to recognize the fact that individual data stewards work within different business silos or system groups. Often their efforts are not as coordinated as they could be, and sometimes they don't even know whom the other data stewards are that they should be communicating with. This is where Web 2.0 helps -- by improving communications across geographic, project, business, organizational, system, language and time zone boundaries.
These Web 2.0 and collaboration tools, which tend to be readily available, easy to implement, and simple to use, make the data steward's job easier while contributing substantive data quality improvements. They're also no big secret. Many IT professionals are already utilizing technologies such as wikis, tags, blogs and mashups to communicate and collaborate. They just aren't using them in a coordinated way to address data governance issues, which is an issue that can be easily changed.
Wikis
Wikis are open, web-based forums that help participants communicate and collaborate more effectively. Wikis are the perfect place for companies to document information about data governance projects as well as other cross-functional or cross-organizational projects. Data governance teams can easily leverage wikis to streamline communications across the enterprise. A data governance wiki is an ideal place for data stewards to have an open online dialogue that can be captured in a collaborative fashion.
A data governance wiki is the best forum for reviewing and refining data policies as well as documenting the evolution of these policies throughout a project. Often policies become static and don't have the impact they should have because, once finalized, they become documents that are simply filed away somewhere and rarely used. Policies that are managed in wikis can be actively used, reviewed and refreshed by critical members of the data governance team. Wikis can also be the repository for documenting data quality problems and tying those problems to the policies governing the data (see tagging below).
In addition, wikis can be invaluable for managing the evolution and dissemination of a company's business rules as well as documenting decision-making rights for specific data. They provide a central location where data stewards and other IT people from around the company can view rules and evaluate whether they are upholding them. Wikis can also be used to define roles and responsibilities and members of decision-making groups, such as the data governance board or an implementation team. Information about board or team members can be posted with links to photos and information about members that is stored in the corporate directory. Meeting minutes, project plans and timelines can also be included here.
Tagging
Also known as "folksonomies" and social classification, collaborative tagging is a great way for people to create their own annotations and descriptors about a given subject. Popular examples of sites that use tags include Digg, De.licio.us, Diigo, and Ma.gnolia. These sites all enable users to create their own taxonomy of terms, and "tag" content (and bookmarks) with these terms, which can then be used to find other content that share the same tags.
Tagging creates a semantically rich environment, where people can define problems from the bottom-up using their own words, instead of relying on a top-down set list of pre-defined terms. This allows the nuances of a data problem to be defined by creating many different tags for a problem. Because tagging is a user-driven approach to organizing content, one data steward might tag a data quality problem as a message fault, another may see it as a data quality issue and another may identify it as a data validation problem. If someone is attempting to fix a problem and they perform an analysis on the systems involved, they could discover causal correlations through tag analysis. For example, they might discover that every time a data validation error is tagged, there is also a tag issued for a message fault and for data quality. This discovery could result in a dialogue between the three analysts that would help to better resolve the issue.
There are a number of other ways that tagging can be used to help with data governance issues. Tags can be added to wiki discussions to track system and data quality issues that need attention and participants can click on them to link to further discussions. Tags can also be used to document which systems are participating in a customer data governance project and identify the level of compliance and security of each system. "Tag clouds" can also be created from documents, URLs, and other sources, to make it easier to discover and analyze content, such as data governance policies.
User Ranking Systems
These type of evaluation tools are used by a variety of community and e-commerce sites, such as Netflix, which employs a five-star rating system that enables users to rank movies, and Amazon.com, which ranks products according to a five-star system based on consumer reviews. The reason ranking systems have been so successful is that most people have opinions, but in most cases lack a method or location to express them. User ranking systems provide an easy, visual way to determine popularity, worth, and value. Businesses can utilize them to help assess the success of data governance projects or the value, risk or importance of a data quality topic.
Why not allow data stewards to create a list of topics that people can view and rank on an internal Web site or a wiki? These rankings could give companies a way to measure the success or failure of a given project or determine the priority for new initiatives. Rankings could be used to assess the quality of data repairs, fixes, and remediation. In addition, they could be used to document observations about throughput, reliability, availability, security, trustworthiness, and accuracy of any system, interface, service, message, adaptor/connector, or report.
Ranking systems could also be opened up to customers and partners, which would allow companies to start aligning projects or technologies directly with business value to determine return on investment. Giving customers and partners a method to rank services also provides a simple way to determine customer satisfaction and answer usability questions.
Blogs
By now practically everyone knows what a blog is. But how can blogs be used to help improve data quality? They can educate and inform employees, or groups can use them to debate unresolved issues or to continue discussions in between meetings. For educational purposes, for example, a blog could provide information about why one project was chosen over another. Blogs could also provide updates on projects and actively request reader feedback.
Data governance boards could assign different data stewards to blog each week about the problems they are trying to solve and the projects they are working on. Over time, this type of blog would help inform data stewards, data governance constituents and other readers about how the company is working to solve global data quality issues.
RSS Feeds
RSS feeds are a great way to push information to people. Whether it is information about new training and educational materials, updates to project milestones in a wiki, final results of a user ranking survey, or a weekly podcast highlighting a unique data quality issue, RSS feeds help streamline and improve the efficiency of information distribution.
Mashups
Mashups are a type of Web application that combines data from multiple sources into a single, rich integrated application to allow people to get the information they need much faster than they ever could before. Using mashups, corporations can quickly provide data quality, data validation, or master data management services within a single web interface without the time and expense of IT integration projects or the need to purchase new systems.
Wouldn't it be great if data governance team members could use mashups to show real data in relation to a data policy wiki? For example, if there was a policy that governed postal addresses for customers, a dashboard could be invoked from a wiki that shows the number of address exceptions captured, new addresses added to the system and sources of policy violations. That same mashup could pull up the metadata specification and policy for postal addresses and compare them to actual data to ensure compliance. Another example of how a mashup could be used is in conjunction with a policy governing the definition of a customer. In this case, the mashup could invoke a service that runs a report within the data policy wiki that shows the number of new customers within a given period. Mashups could also be used to populate a dashboard displaying key performance indicators (KPIs) such as number of orders tracked, new customers, number of postal address acceptances or rejections, etc.
Workflow
Workflow technologies predate Web 2.0, but are still powerful collaboration tools and should actively used to inject huge efficiency improvements into the data governance process. With workflow tools, corporations can assign a problem, such as resolving proper address details about a specific customer contained in multiple records, and track that issue through resolution. Workflow technologies ensure that data quality issues are managed consistently and completely from beginning to end.
With workflow technologies, relevant IT or data personnel might receive an email message with a URL linked to a web page containing an explanation of a data quality problem that they need to resolve. Or, they might receive a text message, voicemail or instant message that alerts them to an issue that they need to handle. The data quality system can be configured to send an employee a new message detailing a new problem in the workflow periodically. Or the system could be set up as a relay mechanism, so that as each problem arises the employee originally given the task of resolving a conflict could take action or tag the page and explain why that issue would be better solved by someone else. The employee could then pass a new message with the link to the tagged page to the next person in the data governance chain. Throughout the process, the workflow system would track the problem as it proceeds to resolution and send reports and alerts to designated personnel, as well as tracking time to completion, escalations, and exceptions along the way.
Communication that Leads to Better Problem Solving
Data steward teams already utilize various technologies to help them identify data patterns, anomalies and other data quality issues. But data stewards also need tools to help them coordinate efforts, communicate more effectively and achieve better results. With the Web 2.0 and collaboration technologies I've described here, businesses can increase the success of their data governance initiatives, while giving all participants a powerful voice in the process. Each of these methods can be used in a variety of ways to fit the corporate culture of individual companies. The trick is to coordinate their usage across all divisions, departments, and geographical locations to ensure everyone contributes.
About the Author
Marty Moseley is a 25-year IT industry veteran with extensive systems architecture experience. Moseley is an accomplished speaker and author on technology topics including data governance, customer data integration, master data management, service-oriented architecture, software architecture and product-line architecture. Moseley currently serves as chief technology officer at Initiate Systems, a provider of master data management software for companies, healthcare organizations and government agencies that want to create the most complete, real-time views of people, households and organizations from data dispersed across multiple application systems and databases. He can be reached at [email protected] and additional information on Initiate Systems is available at www.InitiateSystems.com.
When used strategically, Web 2.0 and other collaboration technologies can easily and cost-effectively streamline the data governance process and move the productivity dial.
Companies first have to recognize the fact that individual data stewards work within different business silos or system groups. Often their efforts are not as coordinated as they could be, and sometimes they don't even know whom the other data stewards are that they should be communicating with. This is where Web 2.0 helps -- by improving communications across geographic, project, business, organizational, system, language and time zone boundaries.
These Web 2.0 and collaboration tools, which tend to be readily available, easy to implement, and simple to use, make the data steward's job easier while contributing substantive data quality improvements. They're also no big secret. Many IT professionals are already utilizing technologies such as wikis, tags, blogs and mashups to communicate and collaborate. They just aren't using them in a coordinated way to address data governance issues, which is an issue that can be easily changed.
Wikis
Wikis are open, web-based forums that help participants communicate and collaborate more effectively. Wikis are the perfect place for companies to document information about data governance projects as well as other cross-functional or cross-organizational projects. Data governance teams can easily leverage wikis to streamline communications across the enterprise. A data governance wiki is an ideal place for data stewards to have an open online dialogue that can be captured in a collaborative fashion.
A data governance wiki is the best forum for reviewing and refining data policies as well as documenting the evolution of these policies throughout a project. Often policies become static and don't have the impact they should have because, once finalized, they become documents that are simply filed away somewhere and rarely used. Policies that are managed in wikis can be actively used, reviewed and refreshed by critical members of the data governance team. Wikis can also be the repository for documenting data quality problems and tying those problems to the policies governing the data (see tagging below).
In addition, wikis can be invaluable for managing the evolution and dissemination of a company's business rules as well as documenting decision-making rights for specific data. They provide a central location where data stewards and other IT people from around the company can view rules and evaluate whether they are upholding them. Wikis can also be used to define roles and responsibilities and members of decision-making groups, such as the data governance board or an implementation team. Information about board or team members can be posted with links to photos and information about members that is stored in the corporate directory. Meeting minutes, project plans and timelines can also be included here.
Tagging
Also known as "folksonomies" and social classification, collaborative tagging is a great way for people to create their own annotations and descriptors about a given subject. Popular examples of sites that use tags include Digg, De.licio.us, Diigo, and Ma.gnolia. These sites all enable users to create their own taxonomy of terms, and "tag" content (and bookmarks) with these terms, which can then be used to find other content that share the same tags.
Tagging creates a semantically rich environment, where people can define problems from the bottom-up using their own words, instead of relying on a top-down set list of pre-defined terms. This allows the nuances of a data problem to be defined by creating many different tags for a problem. Because tagging is a user-driven approach to organizing content, one data steward might tag a data quality problem as a message fault, another may see it as a data quality issue and another may identify it as a data validation problem. If someone is attempting to fix a problem and they perform an analysis on the systems involved, they could discover causal correlations through tag analysis. For example, they might discover that every time a data validation error is tagged, there is also a tag issued for a message fault and for data quality. This discovery could result in a dialogue between the three analysts that would help to better resolve the issue.
There are a number of other ways that tagging can be used to help with data governance issues. Tags can be added to wiki discussions to track system and data quality issues that need attention and participants can click on them to link to further discussions. Tags can also be used to document which systems are participating in a customer data governance project and identify the level of compliance and security of each system. "Tag clouds" can also be created from documents, URLs, and other sources, to make it easier to discover and analyze content, such as data governance policies.
User Ranking Systems
These type of evaluation tools are used by a variety of community and e-commerce sites, such as Netflix, which employs a five-star rating system that enables users to rank movies, and Amazon.com, which ranks products according to a five-star system based on consumer reviews. The reason ranking systems have been so successful is that most people have opinions, but in most cases lack a method or location to express them. User ranking systems provide an easy, visual way to determine popularity, worth, and value. Businesses can utilize them to help assess the success of data governance projects or the value, risk or importance of a data quality topic.
Why not allow data stewards to create a list of topics that people can view and rank on an internal Web site or a wiki? These rankings could give companies a way to measure the success or failure of a given project or determine the priority for new initiatives. Rankings could be used to assess the quality of data repairs, fixes, and remediation. In addition, they could be used to document observations about throughput, reliability, availability, security, trustworthiness, and accuracy of any system, interface, service, message, adaptor/connector, or report.
Ranking systems could also be opened up to customers and partners, which would allow companies to start aligning projects or technologies directly with business value to determine return on investment. Giving customers and partners a method to rank services also provides a simple way to determine customer satisfaction and answer usability questions.
Blogs
By now practically everyone knows what a blog is. But how can blogs be used to help improve data quality? They can educate and inform employees, or groups can use them to debate unresolved issues or to continue discussions in between meetings. For educational purposes, for example, a blog could provide information about why one project was chosen over another. Blogs could also provide updates on projects and actively request reader feedback.
Data governance boards could assign different data stewards to blog each week about the problems they are trying to solve and the projects they are working on. Over time, this type of blog would help inform data stewards, data governance constituents and other readers about how the company is working to solve global data quality issues.
RSS Feeds
RSS feeds are a great way to push information to people. Whether it is information about new training and educational materials, updates to project milestones in a wiki, final results of a user ranking survey, or a weekly podcast highlighting a unique data quality issue, RSS feeds help streamline and improve the efficiency of information distribution.
Mashups
Mashups are a type of Web application that combines data from multiple sources into a single, rich integrated application to allow people to get the information they need much faster than they ever could before. Using mashups, corporations can quickly provide data quality, data validation, or master data management services within a single web interface without the time and expense of IT integration projects or the need to purchase new systems.
Wouldn't it be great if data governance team members could use mashups to show real data in relation to a data policy wiki? For example, if there was a policy that governed postal addresses for customers, a dashboard could be invoked from a wiki that shows the number of address exceptions captured, new addresses added to the system and sources of policy violations. That same mashup could pull up the metadata specification and policy for postal addresses and compare them to actual data to ensure compliance. Another example of how a mashup could be used is in conjunction with a policy governing the definition of a customer. In this case, the mashup could invoke a service that runs a report within the data policy wiki that shows the number of new customers within a given period. Mashups could also be used to populate a dashboard displaying key performance indicators (KPIs) such as number of orders tracked, new customers, number of postal address acceptances or rejections, etc.
Workflow
Workflow technologies predate Web 2.0, but are still powerful collaboration tools and should actively used to inject huge efficiency improvements into the data governance process. With workflow tools, corporations can assign a problem, such as resolving proper address details about a specific customer contained in multiple records, and track that issue through resolution. Workflow technologies ensure that data quality issues are managed consistently and completely from beginning to end.
With workflow technologies, relevant IT or data personnel might receive an email message with a URL linked to a web page containing an explanation of a data quality problem that they need to resolve. Or, they might receive a text message, voicemail or instant message that alerts them to an issue that they need to handle. The data quality system can be configured to send an employee a new message detailing a new problem in the workflow periodically. Or the system could be set up as a relay mechanism, so that as each problem arises the employee originally given the task of resolving a conflict could take action or tag the page and explain why that issue would be better solved by someone else. The employee could then pass a new message with the link to the tagged page to the next person in the data governance chain. Throughout the process, the workflow system would track the problem as it proceeds to resolution and send reports and alerts to designated personnel, as well as tracking time to completion, escalations, and exceptions along the way.
Communication that Leads to Better Problem Solving
Data steward teams already utilize various technologies to help them identify data patterns, anomalies and other data quality issues. But data stewards also need tools to help them coordinate efforts, communicate more effectively and achieve better results. With the Web 2.0 and collaboration technologies I've described here, businesses can increase the success of their data governance initiatives, while giving all participants a powerful voice in the process. Each of these methods can be used in a variety of ways to fit the corporate culture of individual companies. The trick is to coordinate their usage across all divisions, departments, and geographical locations to ensure everyone contributes.
About the Author
Marty Moseley is a 25-year IT industry veteran with extensive systems architecture experience. Moseley is an accomplished speaker and author on technology topics including data governance, customer data integration, master data management, service-oriented architecture, software architecture and product-line architecture. Moseley currently serves as chief technology officer at Initiate Systems, a provider of master data management software for companies, healthcare organizations and government agencies that want to create the most complete, real-time views of people, households and organizations from data dispersed across multiple application systems and databases. He can be reached at [email protected] and additional information on Initiate Systems is available at www.InitiateSystems.com.