Summary The field of seismic analysis has a large amount of data, however, annotating all the images is challenging and time-consuming. Therefore, the use of self-supervised learning provides the possibility to pre-train the models with the vast unlabelled dataset and then use the labelled ones for the downstream tasks. To take advantage of that, this work explores the Masked Autoencoders self-supervised method, comparing the efficacy of Vision Transformer (ViT) architectures (ViT-Small, ViT-Large). We investigate the model performance on subsets of labelled seismic facies data for fine-tuning using a Segmentation Transformer for segmentation. We also compare the results of segmentation when using two different pre-trained ViTs: supervised pre-train with the ImageNet and self-supervised pre-train with a seismic dataset. The ViT-Small and ViT-Large exhibit similar metric values. However, the ViT-Small has a shorter training time. The pre-trained ViT using the seismic dataset achieves superior performance for the different percentages of the labelled dataset, especially with fewer data, which indicates that the seismic pre-training generalizes better results of segmentation of seismic data compared to the one pre-trained with ImageNet, which demonstrates the benefits of the pre-train for the seismic data analysis.
Olena StankevychDanylo Matviikiv
Sukmin YunHankook LeeJaehyung KimJinwoo Shin
Paul TressonMaxime DumontMarc JaegerFrédéric BorneStéphane BoivinLoïc Marie-LouiseJérémie FrançoisHassan BoukcimHervé Goëau