The identification of causal relationships from observational time series data constitutes a fundamental challenge across scientific disciplines, ranging from climate science to econometrics and systems biology. While classical constraint-based and score-based methods have achieved success in low-dimensional settings, they frequently falter when applied to high-dimensional data, particularly in the presence of latent confounders—unobserved variables that influence two or more observed variables, leading to spurious correlations. This paper introduces a novel framework, Causal-Diff, which leverages the generative power of score-based diffusion models to address these limitations. By modeling the time-dependent evolution of the data distribution via stochastic differential equations, we approximate the score function (the gradient of the log-density) to disentangle observed temporal dependencies from hidden confounding factors. Unlike traditional structural equation models that rely on rigid parametric assumptions, our approach utilizes the flexibility of deep neural networks to learn complex, non-linear causal mechanisms. We theoretically demonstrate that the score matching objective, when augmented with appropriate sparsity constraints and temporal masking, allows for the identifiability of the causal graph even under partial observability. Extensive experiments on both synthetic datasets and real-world functional magnetic resonance imaging (fMRI) data reveal that Causal-Diff significantly outperforms state-of-the-art baselines in terms of structural Hamming distance and orientation accuracy.
Yuxiao ChengLianglong LiTingxiong XiaoZongren LiJinli SuoKunlun HeQionghai Dai