JOURNAL ARTICLE

Mapping specifications for ranked hierarchical trees in data integration systems

Abstract

A popular approach to deal with data integration of heterogeneous data sources is to Extract, Transform and Load (ETL) data from disparate sources into a consolidated data store while addressing integration challenges including, but not limited to: structural differences in the source and target schemas, semantic differences in their vocabularies, and data encoding. This work focuses on the integration of tree-like hierarchical data or information that when modeled as a relational schema can take the shape of a flat schema, a self-referential schema or a hybrid schema. Examples include evolutionary taxonomies, geological time scales, and organizational charts. Given the observed complexity in developing ETL processes for this particular but common type of data, our work focuses on reducing the time and effort required to map and transform this data. Our research automates and simplifies the transformation from ranked self-referential to flat representations (and vice-versa), by: (a) proposing MSL+, an extension to IBM's Mapping Specification Language (MSL), to succinctly express the mapping between schemas while hiding the actual transformation implementation complexity from the user, and (b) implementing a transformation component for the Talend open-source ETL platform, called Tree Transformer (TT). We evaluated MSL+ and TT, in the context of biodiversity data integration, where this class of transformations is a recurring pattern. We demonstrate the effectiveness of MSL+ with respect to development time savings as well as a 2 to 25-fold performance improvement in transformation time achieved by TT when compared to existing implementations and to Talend built-in components.

Keywords:
Computer science Data integration Schema (genetic algorithms) Data mining Data warehouse Implementation Information integration IBM Data structure Information retrieval Database Programming language

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
27
Refs
0.06
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Semantic Web and Ontologies
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Database Systems and Queries
Physical Sciences →  Computer Science →  Computer Networks and Communications
Scientific Computing and Data Management
Social Sciences →  Decision Sciences →  Information Systems and Management

Related Documents

JOURNAL ARTICLE

On the Number of Ranked Species Trees Producing Anomalous Ranked Gene Trees

Filippo DisantoNoah A. Rosenberg

Journal:   IEEE/ACM Transactions on Computational Biology and Bioinformatics Year: 2014 Vol: 11 (6)Pages: 1229-1238
JOURNAL ARTICLE

Ranked data analysis of a gamut-mapping experiment

Brian A. Millen

Journal:   Journal of Electronic Imaging Year: 2001 Vol: 10 (2)Pages: 399-399
JOURNAL ARTICLE

SMART: Towards Automated Mapping between Data Specifications

Safia Kalwar

Journal:   Proceedings/Proceedings of the ... International Conference on Software Engineering and Knowledge Engineering Year: 2021
© 2026 ScienceGate Book Chapters — All rights reserved.