Handbook of Data Quality : Research and Practice.

This multi-pronged approach to data quality management covers Organization: processes, policies and standards needed to set data quality objectives; Architecture: the technological landscape for deploying them and Computation: required tools and techniques.

Saved in:
Bibliographic Details
Author / Creator: Sadiq, Shazia.
Format: eBook Electronic
Language:English
Edition:1st ed.
Imprint: Berlin, Heidelberg : Springer Berlin / Heidelberg, 2013.
Subjects:
Local Note:Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2023. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.
Online Access:Click to View
Table of Contents:
  • Intro
  • Advisory Panel
  • Preface
  • Contents
  • Prologue: Research and Practice in Data Quality Management
  • 1 Introduction
  • 2 Related Studies
  • 3 Results of Literature Analysis
  • 4 The Three Pillars of Data Quality Management
  • 5 Handbook Topics
  • References
  • Part I Organizational Aspects of Data Quality
  • Data Quality Management Past, Present, and Future: Towards a Management System for Data
  • 1 Introduction and Summary
  • 1.1 This Chapter
  • 2 Foundations and What Works
  • 2.1 Data Quality Defined
  • 2.2 Approach
  • 2.3 A Management System for Data Quality
  • 2.4 Data Defined
  • 2.5 Dimensions of Data Quality
  • 3 Why Aren't All Data of High Quality?
  • 3.1 Rationale Not Believed
  • 3.2 Political, Social, and Structural Impediments
  • 3.3 Properties of Data
  • 3.4 Data Markets (or Lack Thereof!)
  • 4 Research Directions
  • 4.1 Technical Foundations: Bring Information Theory to Bear
  • 4.2 Monetizing Data
  • 4.3 Fundamental Rethink of the Management System for Data
  • 5 Final Remarks: Tremendous Urgency for This Work
  • References
  • Data Quality Projects and Programs
  • 1 Introduction
  • 2 Starting Point: Program or Project?
  • 3 Data Quality Projects
  • 4 The Ten Steps™ Methodology
  • 4.1 The Framework for Information Quality and Other Key Concepts
  • 4.2 The Ten Steps Process
  • 5 Data Quality Programs
  • 5.1 Data Quality Program Framework
  • 5.2 Information and Data Quality Skills
  • 6 A Tale of Two Companies
  • 6.1 Company A
  • 6.1.1 Company Background
  • 6.1.2 Data Quality Program Background and Timeline
  • 6.1.3 DQ Program Plan
  • 6.1.4 Organizational Fit
  • 6.1.5 DQ Program Components
  • 6.1.6 Initial Project
  • 6.1.7 Results
  • 6.2 Company B
  • 6.2.1 Company Background
  • 6.2.2 Data Quality Program Background and Timeline
  • 6.2.3 DQ Program Plan
  • 6.2.4 Data Quality Program Framework
  • 6.2.5 Organizational Fit.
  • 6.2.6 Data Quality Phase One
  • 6.2.7 Initial Projects and Results
  • 7 Comparing the Companies' DQ Programs and Projects
  • 8 A Few Final Words
  • 8.1 Your Starting Point
  • 8.2 Cautions
  • 8.3 Critical Success Factors
  • References
  • Cost and Value Management for Data Quality
  • 1 Introduction
  • 2 Data Quality Cost
  • 2.1 Taxonomy of Data Quality Costs
  • 2.2 Identifying Data Quality Costs
  • 3 Data Quality Value
  • 4 Cost and Value Model for Data Quality
  • 5 Guideline for Cost and Value Analysis
  • 6 Summary and Conclusion
  • References
  • On the Evolution of Data Governance in Firms: The Case of Johnson &amp
  • Johnson Consumer Products North America
  • 1 Introduction
  • 2 Fundamental Concepts
  • 2.1 Data and Data Quality
  • 2.2 Data Governance, Data Management, and Data Quality Management
  • 3 Related Work
  • 3.1 Data Governance
  • 3.2 Organizational Capabilities
  • 3.3 Data Governance as a Dynamic Capability
  • 4 Goal and Approach of the Study
  • 4.1 Goal
  • 4.2 Approach
  • 5 Data Governance at Johnson &amp
  • Johnson
  • 5.1 Company Overview
  • 5.2 Initial Situation
  • 5.2.1 Strategic Perspective
  • 5.2.2 Business Process Perspective
  • 5.2.3 Information Systems Perspective
  • 5.2.4 Pain Points
  • 5.3 Establishing Data Governance
  • 5.3.1 Analysis Phase
  • 5.3.2 Founding Phase
  • 5.3.3 Development Phase
  • 5.3.4 Maturity Phase
  • 5.4 Current Situation
  • 5.4.1 Strategic Perspective
  • 5.4.2 Business Process Perspective
  • 5.4.3 Information Systems Perspective
  • 5.5 Achievements and Success Factors
  • 6 Interpretation of Findings
  • 6.1 Data Governance as a Dynamic Capability
  • 6.2 Managing Data Governance Effectiveness
  • 6.3 Maturity Model for Data Governance Effectiveness
  • 7 Conclusions
  • References
  • Part II Architectural Aspects of Data Quality
  • Data Warehouse Quality: Summary and Outlook
  • 1 Introduction.
  • 1.1 Sources of Data Warehouse Quality Problems
  • 1.2 Roadmap
  • 2 Quality-Aware Data Warehouse Design
  • 3 Data Freshness
  • 4 Data Currency
  • 5 Data Completeness
  • 6 Temporal Consistency
  • 7 Detecting and Profiling Data Quality Problems
  • 7.1 Error Detection
  • 7.2 Error Profiling and Summarization
  • 8 Correcting Data Quality Problems
  • 9 Distributed Data Quality
  • 10 Conclusions and Future Work
  • References
  • Using Semantic Web Technologies for Data Quality Management
  • 1 Introduction
  • 1.1 Data Representation in the Semantic Web
  • 1.2 Potential Contributions of Semantic Web Technologies
  • 2 Big Challenges of Data Quality Management
  • 2.1 A Philosophical View on Data and Information Quality
  • 2.2 The Role of Data Requirements for Data Quality Management
  • 2.3 Generic Data Requirement Typology
  • 3 Employing Semantic Web Technologies for Data Quality Management
  • 3.1 Collaborative Representation and Use of Quality-Relevant Knowledge
  • 3.2 Automated Identification of Data Requirement Conflicts
  • 3.3 Semantic Definition of Data
  • 3.4 Using Semantic Web Data as a Trusted Reference
  • 3.5 Content Integration with Ontologies
  • 4 Limitations of Semantic Web Technologies for DQM
  • 5 Summary and Future Directions
  • References
  • Data Glitches: Monsters in Your Data
  • 1 Introduction
  • 1.1 A Statistical Notion of Data Quality
  • 2 Data Cleaning, an Iterative Process
  • 3 Complex Data Glitches
  • 3.1 A Challenging Problem
  • 4 Glitch Detection
  • 4.1 Constraint Satisfaction Methods
  • 4.2 Statistical Methods
  • 4.3 Patterns of Glitches
  • 5 Quality Assessment
  • 5.1 Glitch Signatures
  • 5.2 Glitch Weighting and Scoring
  • 5.3 Glitch Index
  • 6 Data Repair
  • 6.1 Cleaning Methods
  • 7 Choosing Data-Cleaning Strategies
  • 7.1 An Experimental Framework
  • 8 Relevant Literature
  • 9 Conclusion
  • References.
  • Part III Computational Aspects of Data Quality
  • Generic and Declarative Approaches to Data Quality Management
  • 1 Introduction
  • 2 Classic Integrity Constraints
  • 2.1 The Basics of ICs
  • 2.2 Checking and Enforcing ICs
  • 3 Repairs and Consistent Answers
  • 3.1 Answer Set Programs for Database Repairing
  • 3.2 Active Integrity Constraints
  • 3.3 ICs and Virtual Data Integration
  • 4 Data Dependencies and Data Quality
  • 4.1 Conditional Dependencies
  • 4.2 Data Cleaning with CDs
  • 5 Applications of Declarative Approaches to Entity Resolution
  • 5.1 A Generic Approach: Swoosh
  • 5.2 ER with Matching Dependencies
  • 5.3 Answer-Set Programs for MD-Based ER
  • 5.4 MDs and Swoosh
  • 5.5 Rules and Ontologies for Duplicate Detection
  • 6 Final Remarks
  • References
  • Linking Records in Complex Context
  • 1 Introduction
  • 2 Hierarchical-Structure-Based Approaches
  • 3 Iterative Record Linkage
  • 4 Record Linkage in Complex Information Spaces
  • 5 Relationship-Based Approaches
  • 6 Web-Based Approaches
  • 7 Temporal Record Linkage
  • 8 Summary
  • References
  • A Practical Guide to Entity Resolution with OYSTER
  • 1 Introduction
  • 2 Components of Entity Resolution
  • 2.1 Entity Reference Extraction
  • 2.2 Entity Reference Preparation
  • 2.3 Entity Reference Resolution
  • 2.3.1 Numerical Similarity Functions
  • 2.3.2 Syntactic Similarity Functions
  • 2.3.3 Semantic Similarity Functions
  • 2.3.4 Phonetic Similarity Functions
  • 2.3.5 Efficiency Considerations in Similarity Analysis
  • 2.4 Entity Identity Information Management
  • 2.5 Entity-Relationship Analysis
  • 3 A Demonstration of the Entity Resolution Process with OYSTER
  • 3.1 Description of the Files
  • 3.2 Overview of the Process
  • 3.3 Data Quality Assessment and Data Cleaning
  • 3.4 Selecting Identity Attributes and Crafting the Identity Rules
  • 3.5 Entity Resolution Process.
  • 3.6 Measuring and Analyzing Results
  • 3.7 Storing and Maintaining Identity Information
  • 4 Summary
  • References
  • Managing Quality of Probabilistic Databases
  • 1 Introduction
  • 2 Related Work
  • 3 Data and Query Models
  • 3.1 The Probabilistic Database Model
  • 3.2 Queries
  • 4 The PWS-Quality
  • 4.1 Evaluating the PWS-Quality
  • 4.2 The x-Form of the PWS-Quality
  • 5 Algorithms for Probabilistic Database Cleaning
  • 5.1 Problem Definition
  • 5.2 Evaluating Quality Improvement
  • 5.3 An Optimal and Efficient Data Cleaning Algorithm
  • 5.4 Heuristics for Data Cleaning
  • 6 Conclusions and Future Work
  • References
  • Data Fusion: Resolving Conflicts from Multiple Sources
  • 1 Introduction
  • 2 Challenges and Overview of the Solution
  • 3 Fusing Sources Considering Accuracy
  • 3.1 Data Fusion
  • 3.2 Accuracy of a Source
  • 3.3 Probability of a Value Being True
  • 3.4 Iterative Algorithm
  • 3.5 Extensions and Alternatives
  • 4 Fusing Sources Considering Copying
  • 4.1 Copying Between Sources
  • 4.2 Copy Detection
  • 4.3 Independent Vote Count of a Value
  • 4.3.1 Ideal Vote Count
  • 4.3.2 Estimating Vote Count
  • 4.3.3 Combining with Source Accuracy
  • 4.4 Iterative Algorithm
  • 4.5 Extensions for Copy Detection
  • 5 A Case Study
  • 6 Related Work
  • 7 Summary
  • References
  • Part IV Data Quality in Action
  • Ensuring the Quality of Health Information: The CanadianExperience
  • 1 Introduction
  • 2 The Canadian Institute for Health Information (CIHI)
  • 2.1 Clinical Data
  • 2.2 Health System Data
  • 3 Data Quality at CIHI
  • 3.1 CIHI Prevention Strategies
  • 3.1.1 Data Standards
  • 3.1.2 Technical Standards
  • 3.1.3 System Edits and Audits
  • 3.1.4 Training and Support
  • 3.2 CIHI Monitoring and Feedback Strategies
  • 3.2.1 Operational Reporting and Corrections
  • 3.2.2 Data Quality Assessment
  • 3.2.3 Reabstraction Studies.
  • 3.2.4 Analysis and Data Mining.