Knowledge Graphs (KGs) are structured representations designed to unify heterogeneous data and domain-specific semantics into a coherent and machine-interpretable format. By capturing entities, their attributes, and interrelations through labeled triples, KGs serve as powerful tools for integrating factual content with domain knowledge across diverse sources. In real-world applications, KGs are typically constructed under the Open World Assumption (OWA), a foundational semantic principle stating that the absence of a fact should not be interpreted as evidence of its falsehood, but rather as an indication of incomplete knowledge. The adoption of OWA enables flexibility and scalability in KG design but also introduces inherent incompleteness. KGs are frequently built through automated extraction pipelines that draw from unstructured text, semi-structured records, or disparate databases. As a result, many true facts may remain unstated, either due to source limitations or incomplete mappings. This incompleteness presents significant obstacles for downstream tasks such as learning, inference, and reasoning, which typically rely on observed data patterns. Accordingly, managing and mitigating incompleteness in KGs is a central challenge in ensuring accurate knowledge representation and effective reasoning in intelligent systems. Knowledge Graph Completion (KGC) addresses the problem of inferring missing facts in KG by identifying latent patterns and exploiting the underlying semantic structure of the data. To tackle KG incompleteness, inductive learning methods—both symbolic and numerical—are typically employed to generalize from observed data and predict plausible yet unrecorded triples. However, findings from this thesis reveal that these approaches struggle when applied to KGs that suffer from structural anomalies, semantic inconsistencies, or interoperability issues. Empirical results demonstrate that numerical models often overfit to spurious correlations introduced by noisy data, while symbolic learners may produce rules that contradict domain knowledge when not guided by semantic constraints. These observations underscore a fundamental limitation in current approaches: their inability to validate and semantically align predictions. To address these shortcomings, this thesis establishes the necessity of a unified framework that integrates domain semantics, symbolic reasoning, and formal validation mechanisms—thereby ensuring that inferred knowledge is not only statistically plausible but also semantically sound and consistent with the KG’s intended meaning. The thesis proposes a knowledge-driven framework that enhances KGC through the integration of ontological reasoning, structural normalization, constraint validation, and neuro-symbolic learning. At the core of this framework is the use of ontologies as formal semantic backbones, encoding domain-specific hierarchies, constraints, and logical relationships. These ontological structures are leveraged in symbolic learning via entailment regimes and heuristics such as the Partial Completeness Assumption (PCA), enabling the inference of implicit, meaningful relationships. To further enhance structural integrity, the framework introduces a novel normalization theory for KGs, combined with SHACL-based validation, to resolve anomalies such as blank nodes, overloaded properties, and conflicting assignments. Experimental findings confirm that these normalization and validation steps significantly improve both the quality and interpretability of completed KGs. Building on this foundation, the thesis presents a neuro-symbolic learning architecture that combines the generalization power of neural embeddings with the rule-based transparency of symbolic reasoning. This hybrid model improves predictive performance, enforces semantic alignment, and supports explainability, outperforming traditional approaches across several benchmarks. Collectively, these contributions provide concrete responses to six research questions and establish neuro-symbolic KGC as an effective, scalable, and trustworthy paradigm for the development of semantically grounded AI systems.
Emetis NiazmandGëzim SejdiuDamien GrauxMaría-Esther Vidal