JOURNAL ARTICLE

Discovering Conditional Functional Dependencies

Wenfei FanFloris GeertsJianzhong LiMing Xiong

Year: 2010 Journal:   IEEE Transactions on Knowledge and Data Engineering Vol: 23 (5)Pages: 683-698   Publisher: IEEE Computer Society

Abstract

This paper investigates the discovery of conditional functional dependencies (CFDs). CFDs are a recent extension of functional dependencies (FDs) by supporting patterns of semantically related constants, and can be used as rules for cleaning relational data. However, finding quality CFDs is an expensive process that involves intensive manual effort. To effectively identify data cleaning rules, we develop techniques for discovering CFDs from relations. Already hard for traditional FDs, the discovery problem is more difficult for CFDs. Indeed, mining patterns in CFDs introduces new challenges. We provide three methods for CFD discovery. The first, referred to as CFDMiner, is based on techniques for mining closed item sets, and is used to discover constant CFDs, namely, CFDs with constant patterns only. Constant CFDs are particularly important for object identification, which is essential to data cleaning and data integration. The other two algorithms are developed for discovering general CFDs. One algorithm, referred to as CTANE, is a levelwise algorithm that extends TANE, a well-known algorithm for mining FDs. The other, referred to as FastCFD, is based on the depth-first approach used in FastFD, a method for discovering FDs. It leverages closed-item-set mining to reduce the search space. As verified by our experimental study, CFDMiner can be multiple orders of magnitude faster than CTANE and FastCFD for constant CFD discovery. CTANE works well when a given relation is large, but it does not scale well with the arity of the relation. FastCFD is far more efficient than CTANE when the arity of the relation is large; better still, leveraging optimization based on closed-item-set mining, FastCFD also scales well with the size of the relation. These algorithms provide a set of cleaning-rule discovery tools for users to choose for different applications.

Keywords:
Arity Computer science Data mining Relation (database) Functional dependency Set (abstract data type) Constant (computer programming) Dependency theory (database theory) Object (grammar) Process (computing) Relational database Algorithm Artificial intelligence Mathematics Programming language

Metrics

243
Cited By
13.59
FWCI (Field Weighted Citation Impact)
41
Refs
0.99
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Data Quality and Management
Social Sciences →  Decision Sciences →  Management Science and Operations Research
Data Mining Algorithms and Applications
Physical Sciences →  Computer Science →  Information Systems
Semantic Web and Ontologies
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Discovering Conditional Functional Dependencies

Wenfei FanFloris GeertsLaks V. S. LakshmananMing Xiong

Journal:   Proceedings - International Conference on Data Engineering Year: 2009 Pages: 1231-1234
JOURNAL ARTICLE

Discovering context-aware conditional functional dependencies

Yuefeng DuDerong ShenTiezheng NieYue KouGe Yu

Journal:   Frontiers of Computer Science Year: 2016 Vol: 11 (4)Pages: 688-701
JOURNAL ARTICLE

Discovering (frequent) constant conditional functional dependencies

Thierno DialloNoël NovelliJean-Marc Petit

Journal:   International Journal of Data Mining Modelling and Management Year: 2012 Vol: 4 (3)Pages: 205-205
© 2026 ScienceGate Book Chapters — All rights reserved.