JOURNAL ARTICLE

Pseq2Sites: Enhancing protein sequence-based ligand binding-site prediction accuracy via the deep convolutional network and attention mechanism

Sangmin SeoJonghwan ChoiSeungyeon ChoiJieun LeeChihyun ParkSanghyun Park

Year: 2023 Journal:   Engineering Applications of Artificial Intelligence Vol: 127 Pages: 107257-107257   Publisher: Elsevier BV

Abstract

Protein-ligand interactions play an essential role in many biological processes, and prior knowledge of ligand binding sites is necessary for successful drug design. Many 3D structure- and sequence-based methods have been proposed for identifying ligand binding sites. The 3D structure-based methods typically achieve better binding site prediction than the sequence-based methods. However, as deep-learning techniques that can extract structural information from large-scale sequence data have been developed, the performance gap between 3D structure- and sequence-based methods is narrowing. Nonetheless, there remains room for improvement in sequence-based prediction. We propose Pseq2Sites, a sequence-based deep-learning model for predicting ligand binding sites. Pseq2Sites comprises a 1D convolutional neural network that extracts local features from the protein sequence, and a position-based attention mechanism that captures long-distance dependencies between binding residues. To verify the effectiveness of the proposed method, we compared it with other state-of-the-art methods using three public datasets: COACH420, HOLO4K, and CSAR-NRC HiQ. Utilizing solely protein sequence information, Pseq2Sites outperformed 3D structure-based state-of-the-art methods on external test datasets; within the COACH420 dataset, Pseq2Sites remarkably identified 97% of the binding pockets (at a significance level δ = 0.5), which was 27% higher than the second highest-performing model. Pseq2Sites also achieved outstanding binding site prediction, even for proteins with low similarity to the training dataset. Our code is available at https://github.com/Blue1993/Pseq2Sites.

Keywords:
Computer science Sequence (biology) Artificial intelligence Convolutional neural network Deep learning Similarity (geometry) Protein sequencing Data mining Machine learning Peptide sequence Chemistry

Metrics

13
Cited By
4.02
FWCI (Field Weighted Citation Impact)
50
Refs
0.92
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Computational Drug Discovery Methods
Physical Sciences →  Computer Science →  Computational Theory and Mathematics
Protein Structure and Dynamics
Life Sciences →  Biochemistry, Genetics and Molecular Biology →  Molecular Biology
Microbial Natural Products and Biosynthesis
Health Sciences →  Medicine →  Pharmacology
© 2026 ScienceGate Book Chapters — All rights reserved.