Handbook of Data Quality : Research and Practice.
This multi-pronged approach to data quality management covers Organization: processes, policies and standards needed to set data quality objectives; Architecture: the technological landscape for deploying them and Computation: required tools and techniques.
Saved in:
Author / Creator: | |
---|---|
Format: | eBook Electronic |
Language: | English |
Edition: | 1st ed. |
Imprint: | Berlin, Heidelberg : Springer Berlin / Heidelberg, 2013. |
Subjects: | |
Local Note: | Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2023. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries. |
Online Access: | Click to View |
Table of Contents:
- Intro
- Advisory Panel
- Preface
- Contents
- Prologue: Research and Practice in Data Quality Management
- 1 Introduction
- 2 Related Studies
- 3 Results of Literature Analysis
- 4 The Three Pillars of Data Quality Management
- 5 Handbook Topics
- References
- Part I Organizational Aspects of Data Quality
- Data Quality Management Past, Present, and Future: Towards a Management System for Data
- 1 Introduction and Summary
- 1.1 This Chapter
- 2 Foundations and What Works
- 2.1 Data Quality Defined
- 2.2 Approach
- 2.3 A Management System for Data Quality
- 2.4 Data Defined
- 2.5 Dimensions of Data Quality
- 3 Why Aren't All Data of High Quality?
- 3.1 Rationale Not Believed
- 3.2 Political, Social, and Structural Impediments
- 3.3 Properties of Data
- 3.4 Data Markets (or Lack Thereof!)
- 4 Research Directions
- 4.1 Technical Foundations: Bring Information Theory to Bear
- 4.2 Monetizing Data
- 4.3 Fundamental Rethink of the Management System for Data
- 5 Final Remarks: Tremendous Urgency for This Work
- References
- Data Quality Projects and Programs
- 1 Introduction
- 2 Starting Point: Program or Project?
- 3 Data Quality Projects
- 4 The Ten Steps™ Methodology
- 4.1 The Framework for Information Quality and Other Key Concepts
- 4.2 The Ten Steps Process
- 5 Data Quality Programs
- 5.1 Data Quality Program Framework
- 5.2 Information and Data Quality Skills
- 6 A Tale of Two Companies
- 6.1 Company A
- 6.1.1 Company Background
- 6.1.2 Data Quality Program Background and Timeline
- 6.1.3 DQ Program Plan
- 6.1.4 Organizational Fit
- 6.1.5 DQ Program Components
- 6.1.6 Initial Project
- 6.1.7 Results
- 6.2 Company B
- 6.2.1 Company Background
- 6.2.2 Data Quality Program Background and Timeline
- 6.2.3 DQ Program Plan
- 6.2.4 Data Quality Program Framework
- 6.2.5 Organizational Fit.
- 6.2.6 Data Quality Phase One
- 6.2.7 Initial Projects and Results
- 7 Comparing the Companies' DQ Programs and Projects
- 8 A Few Final Words
- 8.1 Your Starting Point
- 8.2 Cautions
- 8.3 Critical Success Factors
- References
- Cost and Value Management for Data Quality
- 1 Introduction
- 2 Data Quality Cost
- 2.1 Taxonomy of Data Quality Costs
- 2.2 Identifying Data Quality Costs
- 3 Data Quality Value
- 4 Cost and Value Model for Data Quality
- 5 Guideline for Cost and Value Analysis
- 6 Summary and Conclusion
- References
- On the Evolution of Data Governance in Firms: The Case of Johnson &
- Johnson Consumer Products North America
- 1 Introduction
- 2 Fundamental Concepts
- 2.1 Data and Data Quality
- 2.2 Data Governance, Data Management, and Data Quality Management
- 3 Related Work
- 3.1 Data Governance
- 3.2 Organizational Capabilities
- 3.3 Data Governance as a Dynamic Capability
- 4 Goal and Approach of the Study
- 4.1 Goal
- 4.2 Approach
- 5 Data Governance at Johnson &
- Johnson
- 5.1 Company Overview
- 5.2 Initial Situation
- 5.2.1 Strategic Perspective
- 5.2.2 Business Process Perspective
- 5.2.3 Information Systems Perspective
- 5.2.4 Pain Points
- 5.3 Establishing Data Governance
- 5.3.1 Analysis Phase
- 5.3.2 Founding Phase
- 5.3.3 Development Phase
- 5.3.4 Maturity Phase
- 5.4 Current Situation
- 5.4.1 Strategic Perspective
- 5.4.2 Business Process Perspective
- 5.4.3 Information Systems Perspective
- 5.5 Achievements and Success Factors
- 6 Interpretation of Findings
- 6.1 Data Governance as a Dynamic Capability
- 6.2 Managing Data Governance Effectiveness
- 6.3 Maturity Model for Data Governance Effectiveness
- 7 Conclusions
- References
- Part II Architectural Aspects of Data Quality
- Data Warehouse Quality: Summary and Outlook
- 1 Introduction.
- 1.1 Sources of Data Warehouse Quality Problems
- 1.2 Roadmap
- 2 Quality-Aware Data Warehouse Design
- 3 Data Freshness
- 4 Data Currency
- 5 Data Completeness
- 6 Temporal Consistency
- 7 Detecting and Profiling Data Quality Problems
- 7.1 Error Detection
- 7.2 Error Profiling and Summarization
- 8 Correcting Data Quality Problems
- 9 Distributed Data Quality
- 10 Conclusions and Future Work
- References
- Using Semantic Web Technologies for Data Quality Management
- 1 Introduction
- 1.1 Data Representation in the Semantic Web
- 1.2 Potential Contributions of Semantic Web Technologies
- 2 Big Challenges of Data Quality Management
- 2.1 A Philosophical View on Data and Information Quality
- 2.2 The Role of Data Requirements for Data Quality Management
- 2.3 Generic Data Requirement Typology
- 3 Employing Semantic Web Technologies for Data Quality Management
- 3.1 Collaborative Representation and Use of Quality-Relevant Knowledge
- 3.2 Automated Identification of Data Requirement Conflicts
- 3.3 Semantic Definition of Data
- 3.4 Using Semantic Web Data as a Trusted Reference
- 3.5 Content Integration with Ontologies
- 4 Limitations of Semantic Web Technologies for DQM
- 5 Summary and Future Directions
- References
- Data Glitches: Monsters in Your Data
- 1 Introduction
- 1.1 A Statistical Notion of Data Quality
- 2 Data Cleaning, an Iterative Process
- 3 Complex Data Glitches
- 3.1 A Challenging Problem
- 4 Glitch Detection
- 4.1 Constraint Satisfaction Methods
- 4.2 Statistical Methods
- 4.3 Patterns of Glitches
- 5 Quality Assessment
- 5.1 Glitch Signatures
- 5.2 Glitch Weighting and Scoring
- 5.3 Glitch Index
- 6 Data Repair
- 6.1 Cleaning Methods
- 7 Choosing Data-Cleaning Strategies
- 7.1 An Experimental Framework
- 8 Relevant Literature
- 9 Conclusion
- References.
- Part III Computational Aspects of Data Quality
- Generic and Declarative Approaches to Data Quality Management
- 1 Introduction
- 2 Classic Integrity Constraints
- 2.1 The Basics of ICs
- 2.2 Checking and Enforcing ICs
- 3 Repairs and Consistent Answers
- 3.1 Answer Set Programs for Database Repairing
- 3.2 Active Integrity Constraints
- 3.3 ICs and Virtual Data Integration
- 4 Data Dependencies and Data Quality
- 4.1 Conditional Dependencies
- 4.2 Data Cleaning with CDs
- 5 Applications of Declarative Approaches to Entity Resolution
- 5.1 A Generic Approach: Swoosh
- 5.2 ER with Matching Dependencies
- 5.3 Answer-Set Programs for MD-Based ER
- 5.4 MDs and Swoosh
- 5.5 Rules and Ontologies for Duplicate Detection
- 6 Final Remarks
- References
- Linking Records in Complex Context
- 1 Introduction
- 2 Hierarchical-Structure-Based Approaches
- 3 Iterative Record Linkage
- 4 Record Linkage in Complex Information Spaces
- 5 Relationship-Based Approaches
- 6 Web-Based Approaches
- 7 Temporal Record Linkage
- 8 Summary
- References
- A Practical Guide to Entity Resolution with OYSTER
- 1 Introduction
- 2 Components of Entity Resolution
- 2.1 Entity Reference Extraction
- 2.2 Entity Reference Preparation
- 2.3 Entity Reference Resolution
- 2.3.1 Numerical Similarity Functions
- 2.3.2 Syntactic Similarity Functions
- 2.3.3 Semantic Similarity Functions
- 2.3.4 Phonetic Similarity Functions
- 2.3.5 Efficiency Considerations in Similarity Analysis
- 2.4 Entity Identity Information Management
- 2.5 Entity-Relationship Analysis
- 3 A Demonstration of the Entity Resolution Process with OYSTER
- 3.1 Description of the Files
- 3.2 Overview of the Process
- 3.3 Data Quality Assessment and Data Cleaning
- 3.4 Selecting Identity Attributes and Crafting the Identity Rules
- 3.5 Entity Resolution Process.
- 3.6 Measuring and Analyzing Results
- 3.7 Storing and Maintaining Identity Information
- 4 Summary
- References
- Managing Quality of Probabilistic Databases
- 1 Introduction
- 2 Related Work
- 3 Data and Query Models
- 3.1 The Probabilistic Database Model
- 3.2 Queries
- 4 The PWS-Quality
- 4.1 Evaluating the PWS-Quality
- 4.2 The x-Form of the PWS-Quality
- 5 Algorithms for Probabilistic Database Cleaning
- 5.1 Problem Definition
- 5.2 Evaluating Quality Improvement
- 5.3 An Optimal and Efficient Data Cleaning Algorithm
- 5.4 Heuristics for Data Cleaning
- 6 Conclusions and Future Work
- References
- Data Fusion: Resolving Conflicts from Multiple Sources
- 1 Introduction
- 2 Challenges and Overview of the Solution
- 3 Fusing Sources Considering Accuracy
- 3.1 Data Fusion
- 3.2 Accuracy of a Source
- 3.3 Probability of a Value Being True
- 3.4 Iterative Algorithm
- 3.5 Extensions and Alternatives
- 4 Fusing Sources Considering Copying
- 4.1 Copying Between Sources
- 4.2 Copy Detection
- 4.3 Independent Vote Count of a Value
- 4.3.1 Ideal Vote Count
- 4.3.2 Estimating Vote Count
- 4.3.3 Combining with Source Accuracy
- 4.4 Iterative Algorithm
- 4.5 Extensions for Copy Detection
- 5 A Case Study
- 6 Related Work
- 7 Summary
- References
- Part IV Data Quality in Action
- Ensuring the Quality of Health Information: The CanadianExperience
- 1 Introduction
- 2 The Canadian Institute for Health Information (CIHI)
- 2.1 Clinical Data
- 2.2 Health System Data
- 3 Data Quality at CIHI
- 3.1 CIHI Prevention Strategies
- 3.1.1 Data Standards
- 3.1.2 Technical Standards
- 3.1.3 System Edits and Audits
- 3.1.4 Training and Support
- 3.2 CIHI Monitoring and Feedback Strategies
- 3.2.1 Operational Reporting and Corrections
- 3.2.2 Data Quality Assessment
- 3.2.3 Reabstraction Studies.
- 3.2.4 Analysis and Data Mining.